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HIGH LEVEL EXPRESSION OF PROTEINS 

r 

, Field of the Invention , 
The invention concerns genes and methods for expressing eukaryotic 
and viral proteins at high levels in eukaryotic pells, 1 

1 .. ',' Background of the Invention , ( 

Expression of eukaryotic gene products in prokaryotes is sometimes 
limited by the presence of codons that are infrequently used in E. colt 
Expression of such genes can be enhanced by systematic substitution of the 
endogenous codons with codons over represented in highly expressed 
prokaryotic genes (Robinson et al., Nucleic Acid^Res. 12:6663, 1984). It is 
commonly supposed that rare codons cause pausing of the ribosome, which 
leads to a failure to complete the nascent polypeptide chain and a uncoupling of 
transcription and translation. Pausing of the ribosome is thought to lead to 
exposure of the 3' end of the mRNA to cellular ribonucleases. 

Summary of the Invention 

The invention features a synthetic gene encoding a protein normally 
expressed in a mammalian cell or other eukaryotic cell wherein at least one 
non-preferred or less preferred codon in the natural gene encoding the protein 
has been replaced by a preferred codon encoding the same amino acid. 

Preferred codons are: Ala (gcc); Arg (cgc); Asn (aac); Asp (gac) Cys 
(tgc); Gin (cag); Gly (ggc); His (cac); He (ate); Leu (ctg); Lys (aag); Pro (ccc); 
Phe (ttc); Ser (age); Thr (acc); Tyr (tac); and Val (gtg). Less preferred codons 
are: Gly (ggg); He (att); Leu (etc); Ser (tec); Val (gtc); and Arg (agg). All 
codons which do not fit the description of preferred codons or less preferred 
codons are non-preferred codons. In general, the degree of preference of a 
particular codon is indicated by the prevalence of the codon in highly expressed 
human genes as indicated in Table 1 under the heading "High." For example, 
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( "ate" represents 77% of the He codons in highly expressed mammalian genes 
and is the preferred He codon; "att" represents 18% of the lie codons in highly 
expressed mammalian genes and is the less preferred He codon. The sequence 
"ata" represents only 5% of the lie codons in highly expressed human genes as 
5 is a non-preferred He codon. Replacing a codon with another codon that is 
more prevalent in highly expressed human genes will generally increase 
expression of the gene in mammalian cells. Accordingly, the invention 
includes replacing a less preferred codon with a preferred codon as well as 
replacing a non-preferred codon with a preferred or less preferred codon. 

10 By "protein normally expressed in a mammalian cell" is meant a 

protein which is expressed in mammalian under natural conditions.' The term 
f includes genes in the mammalian genome such as those encoding Factor VIII, 
Factor IX, interleukins, and other proteins. The term also includes genes which 
are expressed in a mammalian cell under disease conditions such as oncogenes 

15 as well as genes which are encoded by a virus (including a retrovirus) which 
are expressed in mammalian cells post-infection. By "protein normally 
expressed in a eukaryotic cell" is meant a protein which is expressed in a 
eukaryote under natural conditions. The term also includes genes which are 
expressed in a mammalian cell under disease conditions. 

20 In preferred embodiments, the synthetic gene is capable of 

expressing the mammalian or eukaryotic protein at a level which is at least 
110%, 150%, 200%, 500%, 1,000%, 5,000% or even 10,000% of that 
expressed by the "natural" (or "native") gene in an in vitro mammalian cell 
culture system under identical conditions (i.e., same cell type, same culture 

2 5 conditions, same expression vector). 

Suitable cell culture systems for measuring expression of the 
synthetic gene and corresponding natural gene are described below. Other 

- 2 - 
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suitable expression systems employing mammalian cells are well known to 
those skilled in the art and are described. in* for example; the standard 
molecular biology reference works noted below, Vectors suitable for 
expressing the synthetic and natural genes are described below and in the 
standard reference works described below. By "expression" is meant protein 
expression. Expression can be measured using an antibody specific for the 1 
protein of interest. Such , antibodies and measurement techniques are well 
known to those skilled jn the art. By "natural gene" and "native gene" is meant 
the gene sequence (including naturally occurring allelic variants) which 
naturally encodes the protein, i.e., the native or natural coding sequence. 

In other preferred embodiments at least 10%, 20%, 30%, 40%, 50%, 
60%, 70%, 80%, or 90% of the codons in the natural gene are non-preferred 
codons. 

In other preferred embodiments at least 10%, 20%, 30%, 40%, 50%, 
60%, 70%, 80%, or 90% of the non-preferred codons in the natural gene are 
replaced with preferred codons or less preferred codons. 

In other preferred embodiments at least 10%, 20%, 30%, 40%, 50%, 
60%, 70%, 80%, or 90% of the non-preferred codons in the natural gene are 
replaced with preferred codons. 

In a preferred embodiment the protein is a retroviral protein. In a 
more preferred embodiment the protein is a lentiviral protein. In an even more 
preferred embodiment the protein is an HIV protein. In other preferred 
embodiments the protein is gag, pol, env, gpl20, or gpl60. In other preferred 
embodiments the protein is a human protein. In more preferred embodiments, 
the protein is human Factor VIII and the protein in B region deleted human 
Factor VIII. In another preferred embodiment the protein is green flourescent 
protein. 
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( In various preferred embodiments at least 30%, 40 /6, 50%, 60%, 

70%, 80%, 90%, and 95% of the codons in the synthetic gene are preferred or 
less preferred codons. 1 ( 

The invention also features an expression vector comprising the 
5 synthetic gene. ' 

In another aspect the invention features a cell harboring the synthetic 
gene. In various preferred embodiments the cell is a prokaryotic cell and the 
cell is a mammalian cell. 

In preferred embodiments the synthetic gene, includes fewer than 50, 
1 0 fewer than 40, fewer than 30, fewer than 20, fewer than 1 0, fewer than 5, or no 
"eg" sequences. 1 

The invention also features a method for preparing a synthetic gene 
encoding a protein normally expressed by a mammalian cell or other eukaryotic 
cell. The method includes identifying non-preferred and less-preferred codons 
15 in the natural gene encoding the protein and replacing one or more of the non- 
preferred and less-preferred codons with a preferred codon encoding the same 
amino acid as the replaced codon. 

Under some circumstances (e.g., to permit introduction of a 
restriction site) it may be desirable to replace a non-preferred codon with a less 
2 0 preferred codon rather than a preferred codon. 

It is not necessary to replace all less preferred or non-preferred 
codons with preferred codons. Increased expression can be accomplished even 
with partial replacement of less preferred or non-preferred codons with 
preferred codons. Under some circumstances it may be desirable to only 
25 partially replace non-preferred codons with preferred or less preferred codons 
in order to obtain an intermediate level of expression. 

- 4 - 
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In other preferred embodiments the invention features vectors 
(including expression vectors) comprising one or more the synthetic genes. 

By "vector" is meant a DNA molecule, derived, e.g., from a plasmid, 
bacteriophage, or mammalian or insect virus, into which fragments of DNA 
may be inserted or cloned. A vector will contain one or more unique restriction 
sites and may be capable of autonomous replication in a defined host or vehicle 
organism such that the cloned sequence is reproducible. Thus, by "expression 
vector" is meant any autonomous element capable of directing the synthesis of 
a protein. Such DNA expression vectors include mammalian plasmids and 
viruses. 

The invention also features synthetic gene fragments which encode a 
desired portion of the protein. Such synthetic gene fragments are similar to the 
synthetic genes of the invention except that they encode only a portion of the 
protein. Such gene fragments preferably encode at least 50, 100, 150, or 500 
contiguous amino acids of the protein. 

In constructing the synthetic genes of the invention it may be 
desirable to avoid CpG sequences as these sequences may cause gene silencing. 
Thus, in a preferred embodiment the coding region of the synthetic gene does 
not include the sequence "eg." 

The codon bias present in the HIV gpl20 en v gene is also present in 
the gag sndpol genes. Thus, replacement of a portion of the non-preferred and 
less preferred codons found in these genes with preferred codons should 
produce a gene capable of higher level expression. A large fraction of the 
codons in the human genes encoding Factor VIII and Factor IX are non- 
preferred codons or less preferred codons. Replacement of a portion of these 
codons with preferred codons should yield genes capable of higher level 
expression in mammalian cell culture. 

- 5 - 
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The synthetic genes of the invention' can be introduced into the cells 

of a living organism. For example, vectors (viral or non-virai) can be used to 

■ ' * i 

introduce a synthetic gene into cells of a living organism for gene therapy. 

Conversely, it may be desirable to replace preferred codons in a 

naturally occurring gene with less-preferred codons as a means of lowering 

. i " ' • ' ' • . , ' ' - 

expression. 1 • ' ■ 1 I 

Standard reference works describing the general principles of - 

recombinant DNA technology include Watson etal., Molecular Biology of t^ e 

Qsd&j Volumes I and H, the Benjamin/Cummings Publishing Company, Inc., 

publisher, Menlo Park, CA (1987); Darnell et al., Molecular Cell Biology 

Scientific American Books, Inc., Publisher, New York, N.Y. (1986); Old et al., 

Principles Q f Gene Manipulation; An Introduction to Genetic Engineering 2d 

edition, University of California Press, publisher, Berkeley, CA (1981); 

Maniatis et al., Molecular Cloning: A Laboratory Manual T 2nd Ed. Cold Spring 

Harbor Laboratory, publisher, Cold Spring Harbor, NY (1989); and Current 

Protocols in Molecular Biology. Ausuhel et al., Wiley Press, New York, NY 

(1992). 

By "transformed cell" is meant a cell into which (or into an ancestor 
of which) has been introduced, by means of recombinant DNA techniques, a 
selected DNA molecule, e.g., a synthetic gene. 

By "positioned for expression" is meant that a DNA molecule, e.g., a 
synthetic gene, is positioned adjacent to a DNA sequence which directs 
transcription and translation of the sequence (i.e., facilitates the production of 
the protein encoded by the synthetic gene. 
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, Description of the Drawings , 

Figure 1 depicts the sequence of the synthetic gpl 20 and a synthetic 
gpl60 gene in which codons haye been replaced by those found in highly 
expressed human genes. 1 

5 Figure 2 is a schematic drawing of , the synthetic gp 1 20 (HIV- 1 MN) 

i ' ' 1 ' ' '. . '' ' ■ i 

gene. The shaded portions marked vl to v5 indicate hypervariable 1 regions. The 

filled box indicates the CD4 binding site. A limited number of the unique 

restriction sites ares shown: H (Hind3), Nh (Nhel), P (Pstl), Na (Nael), M 

(Mlul), R (EcbRl), A (Agel) and No (Notl). The chemically synthesized 

10 DNA fragments which served as PCR templates are shown below the gp 120 

sequence, along wi th the locations of the primers used for their amplification. 

Figure 3 is a photograph of the 1 results of transient transfection assays , 

used to measure gpl20 expression. Gel electrophoresis of immunoprccipitated 

i 

supernatants of 293T cells transfected with plasmids expressing gpl20 encoded 
15 by the IIIB isolate of HI V- 1 (gp 1 20111b), by the MN isolate of HIV- 1 
(gpl20mn), by the MN isolate of HIV-1 modified by substitution of the 
endogenous leader peptide with that of the CD5 antigen (gpl 20mnCD5L), or 
by the chemically synthesized gene encoding the MN variant of HIV-1 with the 
human CD5Leader (syngpl20mn). Supernatants were harvested following a 12 
2 0 hour labeling period 60 hours post-transfection and immunoprecipitated with 
CD4:IgGl fusion protein and protein A sepharose. 

Figure 4 is a graph depicting the results of ELISA assays used to 
measure protein levels in supernatants of transiently transfected 293T cells. 
Supernatants of 293T cells transfected with plasmids expressing gpl 20 
25 encoded by the IIIB isolate of HIV-1 (gpl20 Illb), by the MN isolate of HIV-1 
(gpl20mn), by the MN isolate of HIV-1 modified by substitution of the 
endogenous leader peptide with that of CD5 antigen (gpl20mn CD5L), or by 
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the chemically synthesized gene encoding the MN variant of HIV- 1 with 

human CDS leader (syngpl20mn) were harvested after 4 days and tested in a 
gpl20/CD4ELISA. The level of gp 120 is expressed in ng/mL , 

Figure 5 A is a photograph of a gel illustrating the results of a 
5 immunoprecipitation assay used to measure expression of the native and 
synthetic gpl20 in the presence of rev in trans and the RRE in cis. In this 
experiment 293T cells were transiently trans fected by calcium phosphate co- 
, precipitation of 10 //g of plasmid expressing: (A) the synthetic gp!20MN 
sequence and RRE in cis, (B) the gpl20 portion of Hiy-1 IIIB, (C) the gpl20 

10 portion of HIV- 1 IIIB and RRE in cis, all in the presence or absence of rev 
expression. The RRE constructs gpl20IIlbRRE and syngpl20mnRRE were 
generated using an Eagl/Hpal RRE fragment cloned by PCR from a HIV-1 
HXB2 proyiral clone. Each gp!20 expression plasmid was cotransfected with 
1 0 yug of either pCM Vrev or CDM7 plasmid DN A. Supernatants were 

15 harvested 60 hours post transfection, immunoprecipitated with CD4:IgG fusion 
protein and protein A agarose, and run on a 7% reducing SDS-PAGE. The gel 
exposure time was extended to allow the induction of gpl20IIIbrre by rev to be 
demonstrated. 1 
z Figure 5B is a shorter exposure of a similar experiment in which 

2 0 syngpl20mnrre was cotransfected with or without pCMVrev. 

Figure 5C is a schematic diagram of the constructs used in Figure 

sa~ — " 

Figure 6 is a comparison of the sequence of the wild-type ratTHY-1 
gene (wt) and a synthetic ratTHY-1 gene (env) constructed by chemical 
25 synthesis and having the most prevalent codons found in the HIV-1 env gene. 

Figure 7 is a schematic diagram of the synthetic ratTHY-1 gene. The 
solid black box denotes the signal peptide. The shaded box denotes the 

- 8 - 
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sequences in the precursor which direct the attachment of a phophatidyl- 
inositol glycan anchor. Unique restriction sites used for assembly of the 
THY-l constructs are marked H (Hin'd3), M (Mlul), S (Sacl) and No (Notl)'. 
The position of the synthetic oligonucleotides employed in the construction are 
shown at the bottom of the figure. 1 

Figure 8 is a graph depicting the results of flow cytometry analysis. 
In this experiment 293T cells transiently trans fected with either a wild-type 
ratTHY-1 expression plasmid (thick line), ratTHY-1 with envelope codons 
expression plasmid (thin line), or vector only (dotted line) by calcium 
phosphate co-precipitation. Cells were stained with anti-ratTHY-1 monoclonal 
antibody 0X7 followed by a polyclonal FITC-conjugated anti-mouse IgG 
antibody 3 days after transfection. 

Figure 9A is a photograph of a gel illustrating the results of 
immunoprecipitation analysis of supernatants of human 293T cells transfected 
with either syngp!20mn (A) or a construct syngpl20mn. rTHY-lenv which has 
the rTHY-lenv gene in the 3' untranslated region of the syrigpl20mn gene (B). 
The syngpl20mn.rTHY-lenv construct was generated by inserting a Notl 
adapter into the blunted Hind3 site of the rTHY-lenv plasmid. Subsequently, 
a 0.5 kb Notl fragment containing the rTHY-lenv gene was cloned into the 
Notl site of the syngpl20mn plasmid and tested for correct orientation. 
Supernatants of 35 S labeled cells were harvested 72 hours post transfection, 
precipitated with CD4:IgG fusion protein and protein A agarose, and run on a 
7% reducing SDS-PAGE. 

Figure 9B is a schematic diagram of the constructs used in the 
experiment depicted in Figure 9A. 

Figure 1 OA is a photograph of COS cells transfected with vector only 
showing no GFP fluorescence. 

- 9 - 
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Figure 10B is a photograph of COS cells transfected with a CDM7 
expression plasmid encoding native' GFP engineered to include a consensus 
translational initiation sequence. - 

Figure IOC is a photograph of COS cells transfected with an 
5 expression plasmid having the same flanking sequences and initiation 

i ■ • - ' '• ■ ; . . ' • 1 

consensus as in Figure 10B, but bearing a codon optimized gene sequence. 1 
Figure 1 OD is a photograph of COS cells transfected with an 

expression plasmid as in Figure IOC, but bearing a Thr at residue 65 in place of 

Ser. 1 
10 Figure 1 1 depicts the sequence of a synthetic gene encoding green 

flourescent proteins (SEQ ID NO:40). 

Figure 12 depicts the sequence of a native human Factor VIII gene 

lacking the central B domain (amino acids 760-1639, inclusive) (SEQ ID 

NO:41). 

15 Figure 13 depicts the sequence of a synthetic human Factor VIII 

gene lacking the central B domain (amino acids 760-1639, inclusive) (SEQ ID 
NO:42). 

Pespription gf the Preferred Embodiment? 

EXAMPLE 1 

20 Construction of a Synthetic gpl20 Gene Having Codons Found in Highly 
Expressed Human Genes 

A codon frequency table for the envelope precursor of the LAV 
subtype of HIV- 1 was generated using software developed by the University of 
Wisconsin Genetics Computer Group. The results of that tabulation are 
2 5 contrasted in Table 1 with the pattern of codon usage by a collection of highly 
expressed human genes. For any amino acid encoded by degenerate codons, 

- 10 - 
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the most favored codon of the highly expressed genes is different from the most 
favored codon of the HIV envelope precursor. Moreover a simple rule 
describes the pattern of favored envelope codons wherever it applies: preferred 
codons maximize the number of 

adenine residues in the viral RNA. In all cases but one this means that the 
codon in which the third position is A is the most frequently used. In the 

1 it- 

special case of serine, three codons equally contribute one A residue tb the 
mRNA; together these three comprise 85% of the serine codons actually used 
in envelope transcripts. A particularly striking example of the A bias is found 
in the codon choice for arginine, in which the AGA triplet comprises 88% of 
the arginine codons. In addition to the preponderance of A residues, a marked 
preference is seen for uridine among degenerate codons whose third residue 
must be a pyrimidine. Finally, the inconsistencies among the less frequently 
used variants can be accounted for by the observation that the dinucleotide CpG 
is under represented; thus the third position is less likely to be G whenever the 
second position is C, as in the codons for alanine, proline, serine and threonine; 
and the CGX triplets for arginine are hardly used at all. 
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TABLE I; i Codon Frequency in the' HIV- 1 TITh env gene and in hig hly 
expressed human genes. 
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C 


78 
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14 


53 
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22 
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Asp 








His 










GA 


C 


75 


33 


CA 


C 


79 


25 






T 


25 


67 




T 


21 


75 
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25 










AT 


C 


77 
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5 
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CC 


C 


48 


27 


TA 


c , 


74. 


,8 






T 1 


19 


14 


< \ 

1 




26 


92 






A 


16 


• .55 


1 




* 




10 




G 


17 


5 

) 


1 

• 1 






1 




Phe 






.11 


Val 










TT 


C 


80 , 


jj H 

26 


GT 


C 


25 


12 






T 


20 


74 




T 


7 


9 














A 


5 


62 


15 










i 


G 


64 


18 



Codon frequency was calculated using the GCG program established the 
University of Wisconsin Genetics Computer Group. Numbers represent the 
percentage of cases in which the particular codon is used. Codon usage 
2 o frequencies of envelope genes of other HIV- 1 virus isolates are comparable and 
show a similar bias. 



In order to produce a gp!20 gene capable of high level expression in 
mammalian cells, a synthetic gene encoding the gp 120 segment of HIV- 1 was 

25 constructed (syngpl20mn), based on the sequence of the most common North 
American subtype, HIV-1 MN (Shaw et al M Science 226: 11 65, 1984; Gallo et 
al., Nature 321:1 19, 1986). In this synthetic gpl20 gene nearly all of the native 
codons have been systematically replaced with codons most frequently used in 
highly expressed human genes (Figure 1). This synthetic gene was assembled 

3 0 from chemically synthesized oligonucleotides of 1 50 to 200 bases in length. If 
oligonucleotides exceeding 120 to 150 bases are chemically synthesized, the 
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, percentage of full-length product can be low, and the vast excess of material 
consists of shorter oligonucleotides. Since these , shorter fragments inhibit 
cloning and PCR procedures, it can be very difficult to use oligonucleotides 
exceeding a certain length. In order to use crude synthesis material without 
5 prior purification, single-stranded oligonucleotide pools were PCR amplified 
before cloning. PCR products were purified in agarose gels and used as 
templates in the next PCR step. Two adjacent fragments could be co-amplified 
because of overlapping sequences at the end of either fragment. These 
fragments, which were between 350 and 400 bp in size, were subcloned into a 
10 pCDM7-derived plasmid containing the leader sequence of the CDS surface 

i 

molecule followed by a Nhel/Pstl/Mlul/EcoRl/BamH 1 polylinker. Each of 
the restriction enzymes in this polylinker represents a site that is present at 
either the 5 1 or 3' end of the PCR-generated fragments. Thus, by sequential 
subcloning of each of the 4 long fragments, the whole gp!20 gene was 

15 assembled. For each fragment three to six different clones were subcloned and 
sequenced prior to assembly. A schematic drawing of the method used to 
construct the synthetic gpl 20 is shown in Figure 2. The sequence of the 
synthetic gpl 20 gene (and a synthetic gpl 60 gene created using the same 
approach) is presented in Figure 1 . 

2 0 The mutation rate was considerable. The most commonly found 

mutations were short (1 nucleotide) and long (up to 30 nucleotides) deletions. 
In some cases it was necessary to exchange parts with either synthetic adapters 
or pieces from other subclones without mutation in that particular region. 
Some deviations from strict adherence to optimized codon usage were made to 

2 5 accommodate the introduction of restriction sites into the resulting gene to 
facilitate the replacement of various segments (Figure 2). These unique 
restriction sites were introduced into the gene at approximately 1 00 bp 
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intervals. The native HIV leader sequence was exchanged with the highly 

efficient leader peptide of the human CD5 antigen to facilitate secretion 

1 1 * 1 

(Aruffo et al., Cell 61:1303, 1 990) The plasmid used for construction is a ' 

derivative of the mammalian expression vector pCDM7 transcribing the 

5 inserted gene under the control of a strong human CMV immediate early 

promoter. 

To compare the wild-type and synthetic gpl20 coding sequences, the 
synthetic gpl20 coding sequence was inserted into a mammalian expression 
vector and tested in transient transfection assays. Several different native 

D gpl 20 genes were used as controls to exclude variations in expression levels 
between different virus isolates and artifacts induced by distinct leader 
sequences. The gpl 20 HIV Illb construct used as control was generated by 
PCR, using a Sal 1 /Xho 1 HIV- 1 HXB2 envelope fragment as template. To 
exclude PCR induced mutations, a Kpnl/Earl fragment containing 

3 approximately 1 .2 kb of the gene was exchanged with the respective sequence 
from the proviral clone. The wild-type gpl20mn constructs used as controls 
were cloned by PCR from HIV-1 MN infected C8166 cells (AIDS Repository, 
Rockville, MD) and expressed gpl20 either with a native envelope or a CD5 ' 
leader sequence. Since proviral clones were not available in this case, two 

• clones of each construct were tested to avoid PCR artifacts. To determine the 
amount of secreted gpl 20 semi- quantitatively supernatants of 293T cells 
transiently transfected by calcium phosphate co-precipitation were 
immunoprecipitated with soluble CD4: immunoglobulin fusion protein and 
protein A sepharose. 

The results of this analysis (Figure 3) show that the synthetic gene 
product is expressed at a very high level compared to that of the native gpl20 
controls. The molecular weight of the synthetic gpl 20 gene was comparable to 
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control proteins (Figure 3) and appeared toibe in the range of 100 to 110 kd. 

The slightly faster migration can be explained by the fact that in some tumor 

i » • , i i 

cell lines, e.g., 293T, glycosylation is either not complete or altered to some 

extent. 

5 To copipare expression more accurately gpl20 protein levels were 

quantitated using a gpl 20 ELISA with CD4 in the demobilized phase. This 1 
analysis shows (Figure 4) that ELISA data were comparable to the 
immunoprecipitation data, with a gpl 20 concentration of approximately 125 
ng/ml for the synthetic gpl 20 gene, and less than the background cutoff (5 

10 ng/ml) for all the native gpl 20 genes. Thus, expression of the synthetic gpl 20 
gene appears to be at least one order of magnitude higher than wild-type gpl 20 
genes. In the experiment shown the increase was at least 25 fold. 
The R<?le of rev in gpl 20 Expression 

Since rev appears to exert its effect at several steps in the expression 

15 of a viral transcript, the possible role of n on- translation al effects in the 

improved expression of the synthetic gpl 20 gene was tested. First, to rule out 
the possibility that negative signals elements conferring either increased mRNA 
degradation or nucleic retention were eliminated by changing the nucleotide 
sequence, cytoplasmic mRNA levels were tested. Cytoplasmic RNA was 

20 prepared by NP40 lysis of transiently transfected 293T cells and subsequent 
elimination of the nuclei by centrifugation. Cytoplasmic RNA was 
subsequently prepared from lysates by multiple phenol extractions and 
precipitation, spotted on nitrocellulose using a slot blot apparatus, and finally 
hybridized with an envelope-specific probe. 

25 Briefly, cytoplasmic mRNA 293 cells transfected with CDM&, 

gpl20 IIIB, or syngp!20 was isolated 36 hours post transfection. Cytoplasmic 
RNA of Hela cells infected with wild-type vaccinia virus or recombinant virus 
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expressing gpl20 HIb or the synthetic gpl2D gene wasundcr the control of the 
7.5 promoter was isolated 16 hours post infection. Equal amounts were spotted 
on nitrocellulose using a slot blot device and hybridized with randomly labeled 
1 .5 kb gpl20IIIb arid syngpl20 fragments or liuman beta-actin. RNA 
5 expression levels were quantitated by scanning the hybridized membranes with 
a phospoimagpr. The procedures used are, described in greater detail below. 1 

This experiment demonstrated that there was no significant 
difference in the mRNA levels of cells transfected with either the native or 
synthetic gpl20 gene. In fact, in some experiments cytoplasmic mRNA level 
10 of the synthetic gpl20 gene was even lower than that of the native gpl20 gene. 
These data were confirmed by measuring expression from 
recombinant vaccinia viruses. Human 293 cells or Hela cells were infected 
with vaccinia virus expressing wild-type gpl20 IHb or syngpl20mn at a 
multiplicity of infection of at least 10. Supernatants were harvested 24 hours 
15 post infectiori and immunoprecipitated with CD4:immunoglobin fusion protein 
and protein A sepharose. The procedures used in this experiment are described 
in greater detail below. 

This experiment showed that the increased expression of the 
synthetic gene was still observed when the endogenous gene product and the 
20 synthetic gene product were expressed from vaccinia virus recombinants under 
the control of the, strong mixed early and late 7.5k promoter. Because vaccinia 
virus mRNAs are transcribed and translated in the cytoplasm, increased 
expression of the synthetic envelope gene in this experiment cannot be 
attributed to improved export from the nucleus. This experiment was repeated 
2 5 in two additional human cell types, the kidney cancer cell line 293 and HeLa 
cells. As with transfected 293T cells, mRNA levels were similar in 293 cells 
infected with either recombinant vaccinia virus. 
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, QodQn Usage in Lgntiviivs , 

, Because it appears that codon usage has a significant impact on 
expression in mammalian cells, the codon frequency in the envelope genes of 
other retroviruses was examined. This study found no clear pattern of codon 
5 preference between retroviruses in general. However, if viruses from the 

lenti virus genus, to which HIV-1 belongs to, were analyzed separately, codon 
usage bias almost identical to that of HIV-1 was found. A codon frequency 
table from the envelope glycoproteins of a variety of (predominantly type C) 
retroviruses excluding the lentiviruses was prepared, and compared a codon 

10 frequency table created from the envelope sequences of four lentiviruses not 
closely related to HIV-1 (caprine arthritis encephalitis virus, equine infectious 
anemia virus, feline immunodeficiency virus, and visna virus) (Table 2). The 
codon usage pattern for lentiviruses is strikingly similar to that of HIV-1 , in all 
cases but one, the preferred codon for HIV- 1 is the same as the preferred codon 

15 for the other lentiviruses. The exception is proline, which is encoded by CCT 
in 41% of non-HIV lentiviral envelope residues, and by CCA in 40% of 
residues, a situation which clearly also reflects a significant preference for the 
triplet ending in A. The pattern of codon usage by the non-lcntiviral envelope 
proteins does not show a similar predominance of A residues, and is also not as 

2 0 skewed toward third position C and G residues as is the codon usage for the 

highly expressed human genes. In general non-lentiviral retroviruses appear to 
exploit the different codons more equally, a pattern they share with less highly 
expressed human genes. 
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TABLE 2; <, , ' Codon frequency in the envelope gene of lenti viruses Dentil 
and noh-lentiviral retroviruses CotherV 

i ' i i. t 1 



Other Lenti Other Lenti 

Ala . 1 <2ys 
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GC 


C . 


45 


13 


TG 

l . 
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53 


21 
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26 


37 
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47 


79 






'A 


20 
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Gin 

BU| 


'i 


i 














CA 
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52 


69 


10 
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48 


31 




CG 
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.14 


' 2 














T ' 
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.« 3 


Glu 












A • 
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" '5 
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57 


68 






G 


17. 
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G 


43 


32 


15 


AG 


A 


3,1 


51 














G 


15 


26 




















GG 


C 


21 


8. 
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13 
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AA 
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49 


31 




A 


37 


56 


20 




T 


51 


69 




G 


29 


26 
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55 


33 


CA 
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51 


38 
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51 


69 
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49 
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25 










AT 
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38 


16 














T 


31 


22 














A 


31 


61 
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CT 


C 


22 


8 


TC 


C 


38 


10 


30 




T 


14 


9 




T 


17 


16 






A 


21 


16 




A 


18 


24 






G 


19 


11 




G 


6 


5 




TT 


A 


15 


41 


AG 


C 


13 


20 






G 


10 


16 




T 


7 


25 
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60 
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AC 
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44 
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27 
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20 


40 


TA 
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48 


28 




G 
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5, 
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52 


72 


Phe 








Val 








TT 


C 


52 


25 


GT 


C 


36 


9 




T 


48 


75 




T 


17 


10 












A 


22 


54 












G 


25 


27 

1 



Codon frequency was calculated using the GCG program established by the 
University of Wisconsin Genetics Computer Group. Numbers represent the 
percentage in which a particular codon is used. Codon usage of non-lcnti viral 
retroviruses was compiled from the envelope precursor sequences of bovine 
leukemia virus feline leukemia virus, human T-cell leukemia virus type I, 
human T-cell lymphotropic virus type II, the mink cell focus- forming isolate of 
murine leukemia virus (MuLV), the Rauscher spleen focus-forming isolate, the 
10A1 isolate, the 4070A amphotropic isolate and the myeloproliferative 
leukemia virus isolate, and from rat leukemia virus, simian sarcoma virus, 
simian T-cell leukemia virus, leukemogenic retrovirus T1223/B and gibbon ape 
leukemia virus. The codon frequency tables for the non-HI V, non-SIV 
lentiviruses were compiled from the envelope precursor sequences for caprine 
arthritis encephalitis virus, equine infectious anemia virus, feline 
immunodeficiency virus, and visna virus. 



In addition to the prevalence of codons containing an A, lentiviral 
codons adhere to the HIV pattern of strong CpG under representation, so that 
the third position for alanine, proline, serine and threonine triplets is rarely G. 
The retroviral envelope triplets show a similar, but less pronounced, under 
representation of CpG. The most obvious difference between lentiviruses and 



i 



WO 98/12207 PCT/US97/16639 

other retroviruses with respect to CpG prevalence lies in the usage of the CGX 
variant of arginine triplets, which is reasonably frequently represented among 
the retroviral envelope coding sequences, but is almost never present among the 
comparable lentivirus sequences. 
5 Differences in rev Dependence Bet ween Native and Synthetic gp120 

To examine whether regulation by rev is connected to HIV- 1 codon 

i 

usage, the influence of rev on the expression of both native and synthetic gene 
was investigated. Since regulation by rev requires the rev-binding site RRE in 
cis, constructs were made in which this binding site was cloned into the 3* 
1 0 untranslated region of both the native and the synthetic gene. These plasmids 
were co-transfected with rev or a control plasmid in trans into 293T cells, and 
gpl20 expression levels in supernatants were measured semiquantitatively by 
immunoprecipitation. The procedures used in this experiment are described in 
greater detail below. 

15 As shown in Figure 5 A and Figure 5B, rev up regulates the native 

gpl20 gene, but has no effect on the expression of the synthetic gpl20 gene. 

Thus, the action of rev is not apparent on a substrate which lacks the coding 

sequence of endogenous viral envelope sequences. 

Expression of a synthetic ratTHY-1 gene with HIV envelope 
20 codons 

The above-described experiment suggest that in fact "envelope 
sequences" have to be present for rev regulation. In order to test this 
hypothesis, a synthetic version of the gene encoding the small, typically highly 
expressed cell surface protein, ratTHY-1 antigen, was prepared. The synthetic 
25 version of the ratTHY- 1 gene was designed to have a codon usage like that of 
HIV gpl20. In designing this synthetic gene AUUUA sequences, which are 
associated with mRNA instability, were avoided. In addition, two restriction 
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sites were introciuceid to simplify manipulation of the resulting gene (Figure 6). 
This synthetic gerie with the HIV envelope codoh usage (rTHY-lenv) was 
generated using three 150 to 170 mer oligonucleotides (Figure 7). In contrast 
to the syngpl20mn gene, PCR products were directly cloned and assembled in 
5 pUC12, and subsequently cloned into pCDM7: 

Expression levels of native rTHY-1 and, rTHY-1 with the HI V 1 
envelope codons were quantitated by immunofluorescence of transiently 
transfected 293T cells: Figure 8 shows that the expression of the native THY- 1 

gene is almost two orders of magnitude above the background level of the 

i ' • 1 

10 control transfected cells (pCDM7). In contrast, expression of the synthetic 

ratTHY-1 is substantially lower than that of the native gene (shown by the shift 

i 

i 1 

to of the peak towards a lower channel number). , 

To prove that no negative sequence elements promoting mRNA 
degradation were inadvertently introduced, a construct was generated in which 

15 the rTHY-1 eriv gene was cloned at the 3* end of the synthetic gpl20 gene 
(Figure 9B). In this experiment 293T cells were transfected with either the 
syngpl20mn gene or the syngpl20/ratTHY-l env fusion gene 
(syngpl20mn.rTHY-lenv). Expression was measured by immunoprecipitation 
with CD4:IgG fusion protein and protein A agarose. The procedures used in 

2 0 this experiment are described in greater detail below. 

Since the synthetic gpl20 gene has an UAG stop codon, rTHY-lenv 
is not translated from this transcript. If negative elements conferring enhanced 
degradation were present in the sequence, gpl20 protein levels expressed from 
this construct should be decreased in comparison to the syngpl20mn construct 

25 without rTHY-lenv. Figure 9 A, shows that the expression of both constructs is 
similar, indicating that the low expression must be linked to translation. 
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Rev-dependent expression of synthetic ratTHY-1 gene wit h envelope 
codons ' 

' , 1 1 ' ) 

i < , • 

To explore whether rev is able to regulate expression of a ratTHY-1 
gene having eny codons, a construct was made with a rev-binding site in the 3' , 
5 end of the rTHYlenv open reading frame. To measure rev-responsiveness of 
the a ratTHY-1 env construct having a 3' RRE, hujnan 293T cells were 1 
cotransfected ratTHY-1 en vrrc and either CDM7 or pCMVrev. At 60 hours 

post transfection cells were detached with 1 mM EDTA in PBS and stained 

1 1 

with the OX-7 anti rTHY-1 mouse monoclonal antibody and a secondary 

i 1 ■ 

1 0 FITC-conjugated antibody. Fluorescence intensity was measured using a 

EPICS XL cytofluorometer. These procedures are described in greater detail 
below. 1 

In repeated experiments, a slight increase of rTHY-1 env expression 
was detected if rev was cotransfected with the rTHY-1 env gene. To further 

15 increase the sensitivity of the assay system a construct expressing a secreted 
version of rTHY-1 eriv was generated. This construct should produce more 
reliable data because the accumulated amount of secreted protein in the 
supernatant reflects the result of protein production overman extended period, in 
contrast to surface expressed protein, which appears to more closely reflect the 

2 o current production rate. A gene capable of expressing a secreted form was 
prepared by PCR using forward and reverse primers annealing 3' of the 
endogenous leader sequence and 5' of the sequence motif required for 
phosphatidylinositol glycan anchorage respectively. The PCR product was 
cloned into a plasmid which already contained a CDS leader sequence, thus 

2 5 generating a construct in which the membrane anchor has been deleted and the 
leader sequence exchanged by a heterologous (and probably more efficient) 
leader peptide. 
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, The rev-responsiveness of the secreted, form ratTHY- lenv was 

measured by immunoprecipitation of supernatants of human 293T cells 
cotransfected with a plasmid expressing a secreted, form of ratTHY- lenv and 
the RRE sequence in cis (rTHY-lenvPI-rre) and either CDM7 or pCMVrev. 

5 The rTHY- 1 envPI-RRE construct \^as made by PCR using the oligonucleotide: 
cgcggggctagcgcaaagagtaataagtttaac (SEQ ID NO:38) as a forward primer, the 
oligonucleotide: cgcggatcccttgtattttgtactaata (SEQ ID NO:39) as reverse 
primer, and the synthetic rTHY- lenv construct as a template. After digestion 
with Nhel and Notl the PCR fragment was cloned into a plasmid containing 

10 CDS leader and RRE sequences. Supernatants of 35 S labeled cells were 

harvested 72 hours post transfection, precipitated with a mouse monoclonal 
antibody OX7 against rTHY- 1 and anti mouse IgG sepharose, and run on a 
1 2% reducing SDS-P AGE. 

In this experiment the induction of rTHY- lenv by rev was much 

1 5 more prominent and clear-cut than in the above-described experiment and 

strongly suggests that rev is able to translationally regulate transcripts that are 
suppressed by low-usage codons. 

Rev-independent expression of a rTHY-lenv:immunoglobulin 

fusion protein 

2 0 To test whether low-usage codons must be present throughout the 

whole coding sequence or whether a short region is sufficient to confer rev- 

responsiveness^a rTHY^lenv:immunoglobulirTfusion protein was generated. 

In this construct the rTHY- lenv gene (without the sequence motif responsible 
for phosphatidylinositol glycan anchorage) is linked to the human IgGl hinge, 

25 CH2 and CH3 domains. This construct was generated by anchor PCR using 
primers with Nhel and BamHI restriction sites and rTHY- lenv as template. 
The PCR fragment was cloned into a plasmid containing the leader sequence of 
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the CD5 surface molecule and the hinge, CH2 and CH3 parts of human IgGl 

immunoglobulin. A Hind3/Eagl fragment containing the rTHY- lenvegl insert 

was subsequently cloned into a pCDfo7-derived fclasmid with the RRE 

sequence. 

To measure the response of the rTHY-lenv/ immunoglobin fusion 
gene (rTHY-1 enveg lrre) to rev human 293 T cells cotransfected with 
rTHY- 1 enveg 1 ire and either pCDM7 or pCMVrev. The rTHY- 1 enveg 1 rre 
construct was made by anchor PCR using forward and reverse primers with 
Nhel and BamHl restriction sites respectively. The PCR fragment was cloned 
into a plasmid containing a CD5 leader and human IgGl hinge, CH2 and CH3 

i 

domains. Supernatants of 35 S labeled cells were harvested 72 hours post 
transfection, precipitated with a mouse monoclonal antibody OX7 against 
rTHY-1 and anti mouse IgG sepharose, and run on a 12% reducing SDS- 
PAGE. The procedures used are described in greater detail below. 

As with the product of the rTHY-lenvPI- gene, this 
rTHY-lenv/immunoglobulin fusion protein is secreted into the supernatant. 
Thus, this gene should be responsive to rev-induction. However, in contrast to 
rTHY-lenvPI-, cotransfection of rev in trans induced no or only a negligible 
increase of rTHY-lenvegl expression. 

The expression of rTHY-1 rimmunoglobulin fusion protein with 
native rTHY-1 or HIV envelope codons was measured by immunoprecipitation. 
Briefly, human 293T cells transfected with either rTHY-lenvegl (env codons) 
or rTHY-1 wtegl (native codons). The rTHY-lwtegl construct was generated 
in manner similar to that used for the rTHY-lenvegl construct, with the 
exception that a plasmid containing the native rTHY-1 gene was used as 
template. Supernatants of 35 S labeled cells were harvested 72 hours post 
transfection, precipitated with a mouse monoclonal antibody OX7 against 
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rTHY-1 and ,anti mouse IgG sepharosc, and'run on a 12% reducing SDS- 
PAGE. THE procedures used in this experiment are described in greater detail 
below. ' " , > 

Expression levels of rTHY-1 en vegl were decreased in comparison to, 

a similar constrict with wild-type rTHY-1 as the fusion partner, but were still 

1 ', ' ' ' ' ' 1 , 

considerably higher than rTHY-1 env. Accordingly, both parts of the fusion 1 

i *< 

protein influenced expression levels. The addition of rTHY-1 env did not 
restrict expression to an equal level as seen for rTHY-1 env alone.. Thus, 
regulation by rev appears to be ineffective if protein expression is not almost 
completely suppressed. 
Codon preference, in HIV-1 env elope genes 

Direct comparison between codon usage frequency of HIV envelope 
and highly expressed human genes reveals a striking difference for all twenty 
amino acids. One simple measure of the statistical significance of this codon 
preference is the finding that among the nine amino acids with two fold codon 
degeneracy, the favored third residue is A or U in all nine. The probability that 
all nine of two equiprobable choices will be the same is approximately 0.004, 
and hence by any conventional measure the third residue choice cannot be 
considered random. Further evidence of a skewed codon preference is found 
among the more degenerate codons, where a strong selection for triplets 
bearing adenine. can be seen. This contrasts with the pattern for highly 
expressed genes, which favor codons bearing C, or less commonly G, in the 
third position of codons with three or more fold degeneracy. 

The systematic exchange of native codons with codons of highly 
expressed human genes dramatically increased expression of gpl20. A 
quantitative analysis by ELISA showed that expression of the synthetic gene 
was at least 25 fold higher in comparison to native gpl20 after transient 
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! transfection into human 293 cells. The concentration levels in the ELISA 
experiment shown were rather low. Since an ELISA was used for 
quantification which is based on gpi20 binding to CD4, only native, non- 
denatured material was detected. This may explain the apparent low 
5 expression. Measurement of cytoplasmic mRNA levels demonstrated that the 
difference in protein expression is due to translational differences and not 
mRNA stability. 

Retroviruses in general do not show a similar preference towards A 
and T as found for HIV. But if this family was divided into two subgroups, 

10 lentiviruses and non-lentiviral retroviruses, a similar preference to A and, less 

i 

frequently, T, was detected at the third codon position for lentiviruses. Thus, 
the availing evidence suggests that lentiviruses retain a characteristic pattern of 
envelope codons not because of an inherent advantage to the reverse 
transcription or replication of such residues, but rather for some reason peculiar 

15 to the physiology of that class of viruses. The major difference between 

lentiviruses and non-complex retroviruses are additional regulatory and non- 
essential^ accessory genes in lentiviruses, as already mentioned. Thus, one 
simple explanation for the restriction of envelope expression might be that an 
important regulatory mechanism of one of these additional molecules is based 

2 0 on it. In fact, it is known that one of these proteins, rev, which most likely has 
homologues in all lentiviruses. Thus codon usage in viral mRNA is used to 
create a class of transcripts which is susceptible to the stimulatory action of rev. 
This hypothesis was proved using a similar strategy as above, but this time 
codon usage was changed into the inverse direction. Codon usage of a highly 

25 expressed cellular gene was substituted with the most frequently used codons in 
the HIV envelope. As assumed, expression levels were considerably lower in 
comparison to the native molecule, almost two orders of magnitude when 
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analyzed by immunofluorescence of the surface expressed molecule. If rev was 

• . 1 * 1 

coexpressed in trans and a RJ3t£ element was present in cis only a slight 

induction was found for the surface moleculie. However, if THY- 1 was 

expressed as a secreted molecule, the induction by rev was much more 

prominent, supporting the above hypothesis. This can probably be explained 

by accumulation of secreted protein in the supernatant, which considerably 1 

i ■■ < > 

amplifies the rev effect. If rev only induces a minor increase for surface 

molecules in general, induction of HIV envelope by rev cannot have the 

purpose of an increased surface abundance, but rather of an increased 

* i * 1 

intracellular gp 160 level. It is completely unclear at the moment why this 
should be the case. , 

To test whether small subtotal elements of a gene are sufficient to 
restrict expression and render it rev-dependent rTHYlenvrimmunoglobulin 
fusion proteins were generated, in which only about one third of the total gene 
had the envelope codon usage. Expression levels of this construct were on an 
intermediate level, indicating that the rTHY-lenv negative sequence element is 
not dominant over the immunoglobulin part. This fusion protein was not or 
only slightly rev-responsive, indicating that only genes almost completely 
suppressed can be rev-responsive. 

Another characteristic feature that was found in the codon frequency 
tables is a striking under representation of CpG triplets. In a comparative study 
of codon usage iFE. coli, yeast, drosophila and primates it was shown that in a 
high number of analyzed primate genes the 8 least used codons contain all 
codons with the CpG dinucleotide sequence. Avoidance of codons containing 
this dinucleotide motif was also found in the sequence of other retroviruses. It 
seems plausible that the reason for under representation of CpG-bearing triplets 
has something to do with avoidance of gene silencing by methylation of CpG 
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cytosiries. The expected number of CpG diniicleotides, for HIV as a whole is 
about one fifth that expected on the basis-of the base composition. This might 
indicate that the possibility of high expression is restored, and that the gene in 

fact has to be highly expressed at some point (luring viral pathogenesis. 

The results presented herein clearly indicate that codon preference 
has a severe effect on protein levels, and suggest that translational elongation is 
controlling mammalian gene expression. However, other factors may play a 
role. First, abundance of not maximally loaded mRNA's in eukaryotic cells 
indicates that initiation is rate limiting for translation in at least some cases 
since otherwise all transcripts would be completely covered by ribosomes. 
Furthermore, if ribosome stalling and subsequent mRNA degradation were the 
mechanism, suppression by rare codons could most likely not be reversed by 

any regulatory mechanism like the one presented herein. One possible 

i 

explanation for the influence of both initiation and elongation on translational 
activity is that the rate of initiation, or access to ribosomes, is controlled in part 
by cues distributed throughout the RNA, such that the lentiviral codons 
predispose the RNA to accumulate in a pool of poorly initiated RNAs. 
However, this limitation need not be kinetic; for example, the choice of codons 
could influence the probability that a given translation product, once initiated, 
is properly completed. Under this mechanism, abundance of less favored 
codons would incur a significant cumulative probability of failure to complete 
the nascent polypeptide chain. The sequestered RNA would then be lent an 
improved rate of initiation by the action of rev. Since adenine residues are 
abundant in rev-responsive transcripts, it could be that RNA adenine 
methylation mediates this translational suppression. 
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, Detailed Procedures 

, The following procedures were used in the above-described 
experiments. \ 
Sequence Analysis 

5 Sequence analyses employed the software developed by the 

University of Wisconsin Computer Group. 
Plasmid constructions 

Plasmid constructions employed the following methods. Vectors and 
insert DNA was digested at a concentration of 0.5 /^g/10 /A in the appropriate 

1 0 restriction buffer for 1 - 4 hours (total reaction volume approximately 30 //I). 
Digested vector was treated with 10% (v/v) of 1 /ig/ml calf intestine alkaline 
phosphatase for 30 min prior to gel electrophoresis. Both vector and insert 
digests (5 to 10 fA each) were run on a 1 .5% low melting agarose gel with TAE 
buffer. Gel slices containing bands of interest were transferred into a 1 .5 ml 

15 reaction tube, melted at 65 °C and directly added to the ligation without 

removal of the agarose. Ligations were typically done in a total volume of 25 
fj\ in lx Low Buffer lx Ligation Additions with 200-400 U of ligase, 1 /A of 
vector, and 4 /^l of insert. When necessary, 5' overhanging ends were filled by 
adding 1/10 volume of 250 ^M dNTPs and 2-5 U of Klenow polymerase to 

2 o heat inactivated or phenol extracted digests and incubating for approximately 
20 min at room temperature. When necessary, 3' overhanging ends were filled 
by adding 1/10 volume of 2.5 mM dNTPs and 5-10 U of T4 DNA polymerase 
to heat inactivated or phenol extracted digests, followed by incubation at 37 °C 
for 30 min. The following buffers were used in these reactions: lOx Low 

25 buffer (60 mM Tris HC1, pH 7.5, 60 mM MgCl 2 , 50 mM NaCl, 4 mg/ml BSA, 
70 mM p-mercaptoethanol, 0.02% NaN 3 ); lOx Medium buffer (60 mM Tris 
HCt, pH 7,5, 60 mM MgCl 2 , 50 mM NaCl, 4 mg/ml BSA, 70 mM p- 
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mercaptoethanol, 0.02% NaN 3 ); 1 Ox High buffer (60 mM Tris HC1, pH 7.5, 60 
mM MgCl 2 , 50 mM NaCl, 4 mg/ml BSA, 70 mM p-mercaptoethanol, 0.02% 
NaN 3 ); lOx Ligation additions (1 mM ATP, 20 mM DTT, 1 mg/ml BSA, 10 
mM spermidine); 50x TAE (2 M Tris acetate, 50 mM EDTA). 

Oligonucleotide synthesis and purification 

Oligonucleotides were produced on a Milligen 8750 synthesizer 
(Millipore). The columns were eluted with 1 ml of 30% ammonium hydroxide, 
and the eluted oligonucleotides were deblocked at 55 °C for 6 to 12 hours. 
After deblockiong, 150 jA of oligonucleotide were precipitated with lOx 
volume of unsaturated n-butanol in 1.5 ml reaction tubes, followed by 
centrifugation at 15,000 rpm in a microfuge. The pellet was washed with 70% 
ethanol and resuspended in 50 yul of H 2 0. The concentration was determined by 
measuring the optical density at 260 nm in a dilution of 1 :333 (1 OD 260 = 30 
Mg/ml). 

The following oligonucleotides were used for construction of the 
synthetic gpl20 gene (all sequences shown in this text are in 5' to 3' direction), 
oligo 1 forward (Nhel): cgc ggg eta gec acc gag aag ctg (SEQ ID 

NO:l). 

oligo 1 : acc gag aag ctg tgg gtg acc gtg tac tac ggc gtg ccc gtg tgg 
aag ag ag gec acc acc acc ctg ttc tgc gec age gac gec aag gcg tac gac acc gag 
gtg cac aac gtg tgg gec acc cag gcg tgc gtg ccc acc gac ccc aac ccc cag gag gtg 
gag etc gtg aac gtg acc gag aac ttc aac at (SEQ ID NO:2). 

oligo 1 reverse: cca cca tgt tgt tct tec aca tgt tga agt tct c (SEQ ID 

NO:3). 

oligo 2 forward: gac cga gaa ctt caa cat gtg gaa gaa caa cat (SEQ ID 

NO:4) 
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oligo 2: tgg aag aac aac atg gtg gag cag' atg cat. gag gac ate ate age 

' • ' ' - , i 

ctg tgg gac cag age ctg aag ccc tgc gtg aag ctg acc cc ctg tgc gtg acc tg aac tgc 
acc gac ctg agg aac acc acc aac acc aac ac age acc gec aac aac aac age aac age 
gag ggc acc ate aag ggc ggc gag atg (SEQ ID 1^0:5). 
5 oligo 2 reverse (Pstl ) : gtt gaa get gca gtt ctt cat etc gee gee ctt (SEQ 

. ID NO:6).' . , 1 _ ' 1 i 

oligo 3 forward (Pstl): gaa gaa ctg cag ctt caa cat cac cac cag c (SEQ 
IDNO:7). \ . ' . 

oligo 3: aac ate acc acc age ate cgc gac aag atg cag aag gag tac gee 
1 0 ctg ctg tac aag ctg gat ate gtg age ate gac aac gac age acc age tac cgc ctg ate tec 
tgc aac acc age gtg ate acc cag gee tgc ccc aag ate age ttc gag ccc ate ccc ate 
cac tac tgc gee ccc gee ggc ttc gee (SEQ ID NO:8). i 
oligo 3 reverse: gaa ctt ctt gtc ggc ggc gaa gee ggc ggg (SEQ ID 

NO:9). 

15 oligo 4 forward: gcg ccc ccg ccg get teg cca tec tga agt gca acg aca 

aga agt tc (SEQ ID NO: 1 0) 

oligo 4: gec gac aag aag ttc age ggc aag ggc age tgc aag aac gtg age 
acc gtg cag tgc acc cac ggc ate egg ccg gtg gtg age acc cag etc ctg ctg aac 

ggc age ctg gee gag gag gag gtg gtg ate cgc age gag aac ttc acc gac aac gee aag 
2 0 acc ate ate gtg cac ctg aat gag age gtg cag ate (SEQ ID NO: 11) 

oligo 4 reverse (Mlul): agt tgg gac gcg tgc agt tga tct gca cgc tct c 
(SEQ ID NO: 12). 

oligo 5 forward (Mlul): gag age gtg cag ate aac tgc acg cgt ccc 
(SEQ ID NO: 13). 

25 oligo S: aac tgc acg cgt ccc aac tac aac aag cgc aag cgc ate cac ate 

ggc ccc ggg cgc gee ttc tac acc acc aag aac ate ate ggc acc ate etc cag gee cac 
tgc aac ate tct aga (SEQ ID NO: 14) . 
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oligo 5 reverse: gtc gtt cca ctt ggc tct agal gat gtt gca (SEQ ID 
NO:15). . ■ , ■ , , ' v ; \ 

oligo 6 forward: gca aca tct eta gag cca agt gga acg ac (SEQ ID 
NO:16). , 

5 oligo, 6: gcc aag tgg aac gac acc ctg cgc cag ate gtg age aag ctg aag 

gag cag ttc aag aac aag acc ate gtg ttc ac cag age age ggc ggc gac 1 ccc gag ate 
gtg atg cac age ttc aac tgc ggc ggc (SEQ ID NO: 17). 

oligo 6 reverse (EcoRl ): gca gta gaa gaa ttc gcc gcc gca gtt ga (SEQ 
ID NO: 18). .« 

1 0 oligo 7 forward (EcoR 1 ): tea act gcg gcg gcg, aat tct tct act gc (SEQ 

IDNO:19). 

oligo 7: ggc gaa ttc ttc tac tgc aac acc age ccc ctg ttc aac age acc tgg 
aac ggc aac aac acc tgg aac aac acc acc ggc age aac aac aat art acc etc cag tgc 
aag ate aag cag ate ate aac atg tgg cag gag gtg ggc aag gcc atg tac gcc ccc ccc 
15 ate gag ggc cag ate egg tgc age age (SEQ ID NO:20) 

oligo 7 reverse: gca gac egg tga tgt tgc tgc tgc acc gga tct ggc cct c 
(SEQIDNO:21). 

oligo 8 forward: cga ggg cca gat ccg gtg cag cag caa cat cac egg tct 
g (SEQ ID NO:22). 

2 0 oligo 8: aac ate acc ggt ctg ctg ctg acc cgc gac ggc ggc aag gac acc 

gac acc aac gac acc gaa ate ttc cgc ccc ggc ggc ggc gac atg cgc gac aac tgg aga 
tct gag ctg tac aag tac aag gtg gtg acg ate gag ccc ctg ggc gtg gcc ccc acc aag 
gcc aag cgc cgc gtg gtg cag cgc gag aag cgc (SEQ ID NO:23). 

oligo 8 reverse (Notl): cgc ggg egg ccg ctt tag cgc ttc teg cgc tgc 
2 5 acc ac (SEQ ID NO:24). 

The following oligonucleotides were used for the construction of the 
ratTHY-lenv gene. 
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oligo 1 forward (BaniHl/Hind3): cgc ggg gga tec aag ctt acc atg att 
cca gta ata agt (SEQ ID NO:25). 

oligo 1 : atg aat cca gta ata agt ata aca tta tta tta agt gta tta caa atg 
agt aga gga caa aga gta ata agt tta aca gca tct tta gta aat caa aat ttg aga tta gat tgt 
aga cat gaa aat aat aca aat ttg cca ata caa cat gaa ttt tea tta acg (SEQ ID NO:26). 

oligo 1 reverse (EcoRl/Mlul): cgc ggg gaa ttc acg cgt taa tga aaa ttc 
atg ttg (SEQ ID NO:27). 

oligo 2 forward (BamH 1/Mlul): cgc gga tec acg cgt gaa aaa aaa aaa 
cat (SEQ ID NO:28). 

oligo 2: cgt gaa aaa aaa aaa cat gta tta agt gga aca tta gga gta cca gaa 
cat aca tat aga agt aga gta aat ttg ttt agt gat aga ttc ata aaa gta tta aca tta gca aat 
ttt aca aca aaa gat gaa gga gat tat atg tgt gag (SEQ ID NO:29). 

1 oligo 2 reverse (EcoRl/Sac 1 ): cgc gaa ttc gag etc aca cat ata ate tec 
(SEQ ID NO:30). 

oligo 3 forward (BamH 1 /Sac 1): cgc gga tec gag etc aga gta agt gga 
caa(SEQIDNO:31). 

oligo 3: etc aga gta agt gga caa aat cca aca agt agt aat aaa aca ata aat 
gta ata aga gat aaa tta gta aaa tgt ga gga ata agt tta tta gta caa aat aca agt tgg tta 
tta tta tta tta tta agt tta agt ttt tta caa gca aca gat ttt ata agt tta tga (SEQ ID 
NO:32). 

oligo 3 reverse (EcoRl/Notl): cgc gaa ttc gcg gee get tea taa act tat 
aaa ate (SEQ ID NO:33). 

Polymerase Chain Reaction 

Short, overlapping 15 to 25 mer oligonucleotides annealing at both 
ends were used to amplify the long oligonuclotides by polymerase chain 
reaction (PCR). Typical PCR conditions were: 35 cycles, 55 °C annealing 
temperature, 0.2 sec extension time. PCR products were gel purified, phenol 
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extracted, and used in a subsequent PCR to generate longer fragments 
consisting of two adjacent small fragments. These longer fragments were 
cloned into a CDM7-derived plasmid containing a leader sequence of the CDS 

surface molecule followed by a Nhel/Pstl/Mlul/EcoRl/BamHl polylinker. 

. '.' 1 • • 'i 1 

The following solutions were used in these reactions: lOx PCR 

i ' " ■ ' 

buffer (500 mM KC1, 100 mM Tris HC1, pH 7.5, 8, mM MgCl 2 , 2 mM each ' 

i ■< 

dNTP). The final buffer, was complemented with 10% DMSO to increase 
fidelity of the Taq polymerase. , , 1 

Small scale DNA pr eparation 

Transformed bacteria were grown in 3 ml LB cultures for more than 
6 hours or overnight. Approximately 1.5 ml of each culture was poured into 
1.5 ml microfuge tubes, spun for 20 seconds to pellet cells and resuspended in 
200 [A of solution I. Subsequently 400 (A of solution II and 300 fA of solution 

i 

III were added. The microfuge tubes were capped, mixed and spun for > 30 sec. 
Supernatants were transferred into fresh tubes and phenol extracted once. DNA 
was precipitated by filling the tubes with isopropanol, mixing, and spinning in a 
microfuge for > 2 min. The pellets were rinsed in 70 % ethanol and 
resuspended in 50 tx\ dH20 containing 10 fA of RNAse A. The following 
media and solutions were used in these procedures: LB medium (1.0 % NaCl, 
0.5% yeast extract, 1 .0% trypton); solution 1(10 mM EDTA pH 8.0); solution 
II (0.2 M NaOH, 1 .0% SDS); solution III (2.5 M KOAc, 2.5 M glacial aceatic 
acid); phenol (pH adjusted to 6.0, overlaid with TE); TE (10 mM Tris HC1, pH 
7.5, 1 mM EDTA pH 8.0). 

Large scale DNA preparation 

One liter cultures of transformed bacteria were grown 24 to 36 hours 
(MC1061p3 transformed with pCDM derivatives) or 12 to 16 hours (MCI 061 
transformed with pUC derivatives) at 37 °C in either M9 bacterial medium 



I 
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i (pCDM derivatives) or LB (pUC derivatives). Bacteria were spun down in 1 
liter bottles using a Beckman J6 centrifuge at 4,200 rpm for 20 min. The pellet 
was resuspended in 40 ml of solution l! Subsequently, 80 ml of solution II and 
40 ml of solution III were added and the bottles were shaken semivigorously 

5 until lumps of 2 to 3 mm size developed. The bottle was spun at 4,200 rpm for 

. , 1 

5 min and the supernatant was poured through cheesecloth into a 250 ml bottle. 

Isopropanol was added to the top and the bottle was spun at 4,200 
rpm for 10 min. The pellet was resuspended in 4:1 ml of solution I and added 
to 4.5 g of cesium chloride, 0.3 ml of 10 mg/ml ethidium bromide, and 0.1 ml 

10 of 1% Triton XI 00 solution. The tubes were spun in a Beckman 32 high speed 
centrifuge at 10,000 rpm for 5 min. The supernatant was transfcrreel into 
, Beckman Quick Seal ultracentrifuge tubes, which were then sealed and spun in 
a Beckman ultracentrifuge using a N VT90 fixed angle rotor at 80,000 rpm for > 
2.5 hours. The band was extracted by visible light using a 1 ml syringe and 20 

15 gauge needle. An equal volume of dH 2 0 was added to the extracted material. 
DNA was extracted once with n-butanol saturated with 1 M sodium chloride, 
followed by addition of an equal volume of 10 M ammonium acetate/ 1 mM 
EDTA. The material was poured into a 13 ml snap tube which was tehn filled 
to the top with absolute ethanol, mixed, and spun in a Beckman J2 centrifuge at 

2 o 10,000 rpm for 10 min. The pellet was rinsed with 70% ethanol and 

resuspended in 0.5 to 1 ml of H 2 0. The DNA concentration was determined by 
measuring the optical density at 260 nm in a dilution of 1 :200 (1 OD 260 = 50 
A<g/ml). 

The following media and buffers were used in these procedures: M9 
2 5 bacterial medium (10 g M9 salts, 10 g casamino acids (hydrolyzed), 10 ml M9 
additions, 7.5 Aig/ml tetracycline (500 (A of a 15 mg/ml stock solution), 12.5 
/ug/rnl ampicillin (125 /A of a 10 mg/ml stock solution); M9 additions (10 mM 
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( CaCl 2 , 1 00 mM MgS0 4 , 200 /Ug/ml thiamine; 70% iglycerol); LB medium ( 1 .0 
% NaCl, 0.5 % yeast extract^ 1.0 % trypton); Solution I (10 mMEDTA pH 
8.0); Solution II (0.2 M NaOH 1:0 % SDS); Solution III (2.5 M KOAc 2.5 M 
HOAc) 

5 Sequencing ' 

Synthetic genes were; sequenced by the Sanger dideoxynucleotide 
method. In brief, 20 to 50 peg double-stranded plasmid DNA were denatured in 
0.5 M NaOH for 5 min. Subsequently the DNA was precipitated with 1/10 
volume of sodium acetate (pH 5.2) and 2 volumes of ethanol and centrifuged 

10 for 5 min. The pellet was washed with 70% ethanol and resuspended at a 

concentration of 1 f*g//*L The annealing reaction was carried out with 4 yug of 
template DNA and 40 ng of primer in Ix annealing buffer in a final volume of 
10 iA. The reaction was heated to 65 °C and slowly cooled to 37°C. 

In a separate tube 1 /^l of 0.1 M DTT, 2 fA of labeling mix, 0.75 fxl of 

15 dH 2 0, 1 Ml of [ 35 S] dATP (10 ^Ci), and 0.25 fx\ of Sequenase™ (12 U//J) were 
added for each reaction. Five fA of this mix were added to each annealed 
primer-template tube and incubated for 5 min at room temperature. For each 
labeling reaction 2.5 jA of each of the 4 termination mixes were added on a 
Terasaki plate and prewarmed at 37 °C. At the end of the incubation period 3.5 

20 iA of labeling reaction were added to each of the 4 termination mixes. After 5 
min, 4 /A of stop solution were added to each reaction and the Terasaki plate 
was incubated at 80 °C for 10 min in an oven. The sequencing reactions were 
run on 5% denaturing polyacrylamide gel. An acrylamide solution was 
prepared by adding 200 ml of lOx TBE buffer and 957 ml of dH 2 0 to 100 g of 

2 5 aery lamiderbisacrylamide (29:1). 5% polyacrylamide 46% urea and 1 x TBE 
gel was prepared by combining 38 ml of acrylamide solution and 28 g urea. 
Polymerization was initiated by the addition of 400 /il of 10% ammonium 
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peroxodisulfate and 60 /zl of TEMED. 6'els were poured using silanized glass 
plates and sharktobth combs and run in lx TBE buffer at 60 to 100 W for 2 to 4 
hours (depending on the region to be read). Gels were transferred to Whatman 
blotting paper, dried at 80 °C for about 1 hour, and exposed to x-ray film at 
5 room temperature. Typically exposure time was 12 hours. The following 
solutions were used in these procedures: 5x Annealing buffer (200 mM Tris 1 
HC1, pH 7.5, 100 mM MgCl 2 , 250 mM NaCl);i Labelling Mix (7.5 each 
dCTP, dGTP, and dTTP); Termination Mixes (80 yuM each dNTP, 50 mM 
NaCl, 8 ^M ddNTP (one each)); Stop solution (95% formamide, 20 mM 
1 0 EDTA, 0.05 % bromphenol blue, 0.05 % xylencyanol); 5x TBE (0.9 M Tris 

borate, 20 mM EDTA);' Polyacrylamide solution (96.7 g polyacrylamide, 3.3 g 
bisacrylamide, 200 ml lx TBE, 957 ml dH 2 0). 
RNA isolation 

Cytoplasmic RNA was isolated from calcium phosphate transfected 
15 293T cells 36 hours post transfection and from vaccinia infected Hela cells 16 
hours post infection essentially as described by Gilman. (Gilman Preparation 
of cytoplasmic RNA from tissue culture cells. In Current Prntnrnl^ \ n 
Molecular Biology, Ausubel et al., eds., Wiley & Sons, New York, 1992). 
Briefly, cells were lysed in 400 /x\ lysis buffer, nuclei were spun out, and SDS 
2 0 and proteinase K were added to 0.2% and 0.2 mg/ml respectively. The 

cytoplasmic extracts were incubated at 37°C for 20 min, phenol/chloroform 



extracted twice, and precipitated. The RNA was dissolved in 100 fA buffer I 
and incubated at 37°C for 20 min. The reaction was stopped by adding 25 yul 
stop buffer and precipitated again. 
25 Tne following solutions were used in this procedure: Lysis Buffer 

(TRUSTEE containing with 50 mM Tris pH 8.0, 100 mM NaCl, 5 mM MgCl 2 , 
0.5% NP40); Buffer I (TRUSTEE buffer with 10 mM MgCl 2 , 1 mM DTT, 0.5 
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U//il placental IlJNAse inhibitor, 0.1 U//ui RlslAse free pNAse I); Stop buffer 
(50 mM EDTA 1.5 M NaOAc 1.0% SDS): / ' 

Slot blot analysis > 

For slot blot analysis 10 £*g of cytoplasmic RNA was dissolved in 50 , 
5 jul dH 2 0 to which 150 fA of lOx SSC/18% formaldehyde were added. The 
solubilized R>f A was theii incubated at 65 °C for 15 miri and spotted onto with 
a slot blot apparatus. Radioactively labeled probes of 1 .5 kb gp 120111b and 
syngpl20mn fragments were used for hybridization. Each of the two fragments 
was random labeled in a 50 fA reaction with 10 fA of 5x oligo-labeling buffer, 8 

10 fA of 2.5 mg/ml BSA; 4 fA of [« 32 P]-dCTP (20 iiCi/^1; 6p00 Ci/mmol), and 5 U 
of Klenow fragment. After 1 to 3 hours incubation at 37°C 100 ^\ of 
TRUSTEE were added and unincorporated [*c 32 P]-dCTP was eliminated using 
G50 spin column. Activity was measured in a Beckman beta-counter, and 
equal specific activities were used for hybridization. Membranes were pre- 

15 hybridized for 2 hours and hybridized for 12 to 24 hours at 42 °C with 0.5 x 10 6 
cpm probe per ml hybridization fluid. The membrane was washed twice (5 
min) with washing buffer I at room temperature, for one hour in washing buffer 
II at 65 °C, and then exposed to x-ray film. Similar results were obtained using 
a 1.1 kb Notl/Sfil fragment of pCDM7 containing the 3 untranslated region. 

2 0 Control hybridizations were done in parallel with a random-labeled human 

beta-actin probe. , RNA expression was quantitated by scanning the hybridized 
nitrocellulose membranes with a Magnetic Dynamics phosphorimager. 

The following solutions were used in this procedure: 
5x Oligo-labeling buffer (250 mM Tris HC1, pH 8.0, 25 mM MgCI 2 , 5 mM P- 

25 mercaptoethanol, 2 mM dATP, 2 mM dGTP, mM dTTP, 1 M Hepes pH 6.6, 1 
mg/ml hexanucleotides [dNTP]6); Hybridization Solution (.05 M sodium 
phosphate, 250 mM NaCl, 7% SDS, 1 mM EDTA, 5% dextrane sulfate, 50% 
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( formamide, 100 Mg/ml denatured salmon sperm DNA); Washing buffer I (2x 

ssc, , "• , ' 

0.1% SDS); Washing buffer II (0.5x SSC, 0.1 % SDS); 20x SSC (3 M NaCl, 
0.3 M Na 3 citrate, pH adjusted to 7.0). 
5 Vaccinia recombination 1 

Vaccinia recombination used a modification of the of the method 
described by Romeo and Seed (Romeo and Seed, Cell . 64: 1037, 1991). 
Briefly, CV1 cells at 70 to 90% confluency were infected with 1 to 3 fx\ of a 
wild-type vaccinia stock WR (2 x 10 8 pfu/ml) for 1 hour in culture medium 

1 0 without calf serum. After 24 hours, the cells were transfected by calcium 

phosphate with 25 jug TKG plasmid DNA per dish. After an additional 24 to 
48 hours the cells were scraped off the plate, spun down, and resuspended in a 
volume of 1 ml. After 3 freeze/thaw cycles trypsin was added to 0.05 mg/ml 
and lysates were incubated for 20 min. A dilution series of 10, 1 and 0.1 /u\ of 

15 this lysate was used to infect small dishes (6 cm) of CV1 cells, that had been 
pretreated with 12.5 /ug/ml mycophenolic acid, 0.25 mg/ml xanthin and 1.36 
mg/ml hypoxanthine for 6 hours. Infected cells were cultured for 2 to 3 days, 
and subsequently stained with the monoclonal antibody NEA9301 against 
gpl20 and an alkaline phosphatase conjugated secondary antibody. Cells were 

2 0 incubated with 0.3 3 mg/ml NBT and 0. 1 6 mg/ml BCIP in AP-buffer and finally 

overlaid with 1 % agarose in PBS. Positive plaques were picked and 
resuspended in 100 fA Tris pH 9.0. The plaque purification was repeated once. 
To produce high titer stocks the infection was slowly scaled up. Finally, one 
large plate of Hela cells was infected with half of the virus of the previous 
25 round. Infected cells were detached in 3 ml of PBS, lysed with a Dounce 
homogenizer and cleared from larger debris by centrifugation. VPE-8 
recombinant vaccinia stocks were kindly provided by the AIDS repository, 
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Rockville, MD, and express HIV-1 IIIB gpl20 under the 7.5 mixed early/late 
promoter (Earl et al., J. Virol.. 65:3 1, 1991). In.all experiments with 
recombinant vaccina cells were infeclted at a multiplicity of infection of at least 
10. 

5 The following solution was used in this procedure: 

AP buffer (100 mM Tris HCl, pH 9.5, 100 mM NaCl, 5 mM MgCl 2 ) 
Cell culture 

The monkey kidney carcinoma cell lines CV1 and Cos7, the human 
kidney carcinoma cell line 293T, and the human cervix carcinoma cell line 
10 Hela were obtained from the American Tissue Typing Collection and were 
maintained in supplemented IMDM. They were kept on 10 cm tissue culture 
plates and typically split 1 :5 to 1 :20 every 3 to 4 days. The following 
medium was used in this procedure: 

Supplemented IMDM (90% Iscove's modified Dulbecco Medium, 10% calf 
15 serum, iron-complemented, heat inactivated 30 min 56°C, 0.3 mg/ml L- 

glutamine, 25 ^g/ml gentamycin 0.5 mM p-mercaptoethariol (pH adjusted with 
5 M NaOH, 0.5 ml)). 

Trqnsfection 

Calcium phosphate transfection of 293T cells was performed by 
2 0 slowly adding and under vortexing 1 0 fxg plasmid DNA in 250 n\ 0.25 M 
CaCl 2 to the same volume of 2x HEBS buffer while vortexing. After 
incubation for 10 to 30 min at room temperature the DNA precipitate was 
added to a small dish of 50 to 70% confluent cells. In cotransfection 
experiments with rev, cells were transfected with 10 u,g gp 120111b, 
25 gpl20IIIbrre, syngpl20mnrre or rTHY-lenveglrre and 10 jig of pCMVrev or 
CDM7 plasmid DNA. 
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The following solutions were used, in this procedure: 2x HEBS buffer 
(280 mM NaCl, lO mM KC1,1 .5 mM sterile filtered); Q.25 mM CaCl 2 
(autoclaved). ' . 1 

ImrnwnQprecipitation 
5 After 48 to 60 hours medium was exchanged and cells were 

incubated for additional 12 hours in Cys/Met-free medium containing 200 \iCt 
of 35 S-translabel. Supernatants were harvested and spun for 15 min at 3000 
rpm to remove debris. After addition of protease inhibitors leupeptin, aprotinin 
and PMSF to 2.5 ^g/ml/ 50 ng/ml, 100 jxg/ml respectively, 1 ml of supernatant 

10 was incubated with either 10 jxl of packed protein A sepharose alone (rTHY- 
lenveglrre) or with protein A sepharose and 3 ng of a purified 
CD4/immunoglobulin fusion protein (kindly provided by Behring) (all gpl20 
constructs) at 4°C for 12 hours on a rotator. Subsequently the protein A beads 
were washed 5 times for 5 to 15 min each time. After the final wash 10 \i\ of 

15 loading buffer containing was added, samples were boiled for 3 min and 
applied on 7% (all gpl20 constructs) or 10% (rTHY-lenveglrre) SDS 
polyacrylamide gels (TRIS pH 8.8 buffer in the resolving, TRIS pH 6.8 buffer 
in the stacking gel, TRIS-glycin running buffer, Maniatis et ah, supra 1989). 
Gels were fixed in 10% acetic acid and 10 % methanol, incubated with Amplify 

2 0 for 20 min, dried and exposed for 12 hours. 

The following buffers and solutions w ere used in this procedure: 
Wash buffer (100 mM Tris, pH 7.5, 150 mM NaCl, 5 mM CaCl 2 , 1% NP-40); 
5x Running Buffer (125 mM Tris, 1.25 M Glycin, 0.5% SDS); Loading buffer 
(10 % glycerol, 4% SDS, 4% P-mercaptoethanol, 0.02 % bromphenol blue). 

25 Immunofluorescence 

293T cells were transfected by calcium phosphate coprecipitation 
and analyzed for surface THY-1 expression after 3 days. After detachment 
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! with 1 mM EDT A/PBS, cells were stained with the monoclonal antibody OX-7 
in a dilution of 1 :250 at 4°C for 20 min, washed with PBS and subsequently 
incubated with a 1 :500 dilution of a FfrC-conjugated goat anti-mouse 
immunoglobulin antiserum. Cells were washed again, resuspended in 0.5 ml of 

i 

5 a fixing solution, and analyzed on a EPICS XL cytofluorometef (Coulter). 
The following solutions were used in this procedure: 
PBS (137 mM NaCl, 2.7 mM KC1, 4.3 mM Na 2 HP0 4 , 1.4 mM KH 2 P0 4 , pH 
, adjusted to 7.4); Fixing solution (2% formaldehyde in PBS). 
EUSA 

1 0 The concentration of gp 1 20 in culture supematants was determined 

using CD4-coated ELISA plates and goat anti-gpl20 antisera in the 1 soluble 
phase. Supematants of 293T cells transfected by calcium phosphate were 
harvested after 4 days, spun at 3000 rpm for 10 min to remove debris and 
incubated for 12 hours at 4°C on the plates. After 6 washes with PBS 100 u.1 of 

15 goat anti-gpl20 antisera diluted 1 :200 were added for 2 hours. The plates were 
washed again and incubated for 2 hours with a peroxidase-conjugated rabbit 
anti-goat IgG antiserum 1 : 1 000. Subsequently the plates were washed and 
incubated for 30 min with 1 00 ul of substrate solution containing 2 mg/ml o- 
phenylenediamine in sodium citrate buffer. The reaction was finally stopped 
2 0 with 100 ul of 4 M sulfuric acid. Plates were read at 490 nm with a Coulter 

microplate reader. Purified recombinant gp 120111b was used as a control. The 
following buffers and solutions were used in this procedure: Wash buffer (0.1% 
NP40 in PBS); Substrate solution (2 mg/ml o-phenylenediamine in sodium 
citrate buffer). 
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EXAMPLE 2 1 , ' 

A Synthetic Green Fluorescent Protein Gene 

The efficacy of codon replacement for gpl20 suggests that replacing 
non-preferred codons with less preferred codons or preferred codons (and 
replacing less preferred cddons with preferred codons) will increase expression 
in mammalian cells of other proteins, e.g., Other eukaryotic proteins. 

The green fluorescent protein (GFP)'of the jellyfish Aequorea 
victoria (Ward, PhotQChem, P h Ptobi ol 4: 1 , 1?79;' Prasher et al., Qsn& 1 1 1 :229, 
1992; Cody et al., Biochem. 32:1212, 1993) has attracted attention recently for 
its possible utility as a marker or reporter for trarisfection and lineage studies 
(Chalfie et al., Science 263:802, 1994). 

Examination of a codon usage table constructed from the native 
coding sequence of GFP showed that the GFP codons favored either A or U in 
the third position. The bias in this case favors A less than does the bias of 
gp!20, but is substantial. A synthetic gene was created in which the natural 
GFP sequence was re-engineered in much the same manner as for gpl20 (FIG. 
1 1 ; SEQ ID NO:40). In addition, the translation initiation sequence of GFP 
was replaced with sequences corresponding to the translational initiation 
consensus. The expression of the resulting protein was contrasted with that of 
the wild type sequence, similarly engineered to bear an optimized translational 
initiation consensus (FIG. 10B and FIG. 10C). In addition, the effect of 
inclusion of the mutation Ser 65-Thr, reported to improve excitation efficiency 
of GFP at 490 nm and hence preferred for fluorescence microscopy (Heim et 
al., Nature 373:663, 1995), was examined (FIG. 10D). Codon engineering 
conferred a significant increase in expression efficiency (an concomitant 
percentage of cells apparently positive for transfection), and the combination of 
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the Ser 65-»Thr mutation and codon optimization resulted in a DNA segment 
encoding a highly visible mammalian marker protein (FIG. 10D). 

The above-described synthetic green fluorescent protein coding 

1 i 
sequence was assembled in a similar manner as for gpl20 from six fragments 

5 of approximately 120 bp each, using a strategy for assembly that relied on the 

ability of the restriction enzymes Bsal and Bbsl to cleave outside of their 1 

recognition sequence. Long oligonucleotides; were synthesized which 

contained portions of t the coding sequence for GFP embedded in flanking 

sequences encoding EcoRI and Bsal at one end, and BamHI and Bbsl at the 

10 other end. Thus; each oligonucleotide has the configuration EcoRl/Bsal/GFP 
fragment/BbsI/BamHll The restriction site ends generated by the Bsal and 
Bbsl sites were designed to yield compatible ends that could be used to join , 
adjacent GFP fragments. Each of the compatible ends were designed to be 
unique and non-selfcomplementary. The crude synthetic DNA segments were 

15 amplified by PCR, inserted between EcoRI and BamHI in pUC9, and 

sequenced. Subsequiently the intact coding sequence was assembled in a six 
fragment ligation, using insert fragments prepared with Bsal and Bbsl. Two of 
six plasmids resulting from the ligation bore an insert of correct size, and one 
contained the desired full length sequence. Mutation of Ser65 to Thr was 

2 0 accomplished by standard PCR based mutagenesis, using a primer that 
overlapped a unique BssSI site in the synthetic GFP. 

Codon optimization as a strategy for improved expres sion in mammalian cells 

The data presented here suggest that coding sequence re-engineering 
may have general utility for the improvement of expression of mammalian and 
2 5 non-mammalian eukaryotic genes in mammalian cells. The results obtained 

here with three unrelated proteins: HIV gpl20, the rat cell surface antigen Thy- 
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, 1 and green fluorescent protein from Aequorea victoria^ and human Factor VIII 
(see below) suggest that codon optimization may prove to ( be a fruitful strategy 
for improving the expression in mammalian cells of a wide variety of 
eukaryotic genes. 

5 EXAMPLE ITT 

Pggj gn Of a Codon-Qptimized Gene Expressing Human Factor VIII Lacking 
the Central B Domain 

A synthetic gene was designed that encodes mature human Factor 
VIII lacking amino acid residues 760 to 1639, inclusive (residues 779 to 1658, 

10 inclusive, of the precursor). The synthetic gene was created by choosing 
codons corresponding to those favored by highly expressed human genes. 
Some deviation from strict adherence to the favored residue pattern was made 
to allow unique restriction enzyme cleavage sites to be introduced throughout 
the gene to facilitate future manipulations. For preparation of the synthetic 

15 gene the sequence was then divided into 28 segments of 1 50 basepairs, and a 
29th segment of 161 basepairs. 

The a synthetic gene expressing human Factor VIII lacking the 
central B domain was constructed as follows. Twenty-nine pairs of template 
oligonucleotides (see below) were synthesized. The 5 1 template oligos were 

20 1 05 bases long and the 3' oligos were 104 bases long (except for the last 3' 

oligo, which was 125 residues long). The template oli gos were desi gned so 

that each annealing pair composed of one 5' oligo and one 3' oligo, created a 19 
basepair double-stranded regions. 

To facilitate the PCR and subsequent manipulations, the 5' ends of 

2 5 the oligo pairs were designed to be invariant over the first 1 8 residues, allowing 
a common pair of PCR primers to be used for amplification, and allowing the 
same PCR conditions to be used for all pairs. The first 18 residues of each 5 1 
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( member of the template pair were cgc gaa ttcgga aga ccc (SEQ ID NO: 1 10) 
and the first 1 8 residues of each 3' member of the template pair were: ggg gat 
cct cac gtc tea (SEQ ID NO:43). ' f 

Pairs of oligos were annealed and then extended and amplified by 
5 PCR in a reaction mixture as follows: templates were annealed at 200 |xg/ml 
each in PCR buffer (10 mM Tris-HCl, 15 mM MgCl 2 , 50 mM KCI, 100 ng/ml 
gelatin, pH 8.3). The PCR reactions contained 2 ng of the annealed template 
oligos, 0.5 ng of each of the two 1 8-mer primers (described below), 200 uM of 
each of the deoxynucleoside triphosphates, 10% by volume of DMSO and PCR 
1 0 buffer as supplied by Boehringer Mannheim Biochemicals, in a final volume of 

i 

50 ul. After the addition of Taq polymerase (2.5 units, 0.5 ul; Boehringer 
Mannheim Biochemicals) amplifications were conducted on a Pcrkin-Elmer 
Thermal Cycler for 25 cycles (94°C for 30 sec, 55 °C for 30 sec, and 72°C for 
30 sec). The final cycle was followed by a 10 minute extension at 72°C. 

1 5 The amplified fragments were digested with EcoRI and BamHI 

(cleaving at the 5' and 3* ends of the fragments respectively) and ligated to a 
pUC9 derivative cut with EcoRI and BamHI. 

Individual clones were sequenced and a collection of plasmids 
corresponding to the entire desired sequence was identified. The clones were 

2 0 then assembled by multifragment ligation taking advantage of restriction sites 
at the 3' ends of the PCR primers, immediately adjacent to the amplified 
sequence. The 5* PCR primer contained a Bbsl site, and the 3' PCR primer 
contained a BsmBI site, positioned so that cleavage by the respective enzymes 
preceded the first nucleotide of the amplified portion and left a 4 base 5' 

25 overhang created by the first 4 bases of the amplified portion. Simultaneous 

digestion with Bbsl and BsmBI thus liberated the amplified portion with unique 
4 base 5' overhangs at each end which contained none of the primer sequences. 
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In general these, overhangs were not self-complementary, allowing 
multifragment ligation reactions to produce the, desired product with high 
efficiency. The unique portion of the first 28 amplified oligonucleotide pairs 
was thereby 1 54 basepairs, and after digestion each gave rise to a 150 bp 
5 fragment with unique ends. The first and last fragments were not manipulated 
in this manner, however, since they had other restriction sites designed into 1 
them to facilitate insertion of the assembled sequence into an appropriate 
mammalian expression, vector. The actual assembly process proceded as 
follows. 11 
10 Assembly of the Synthetic Factor VIII Gene 

Ste p 1 ; 29 Fragments Assembled to Form 10 Fragments . 

The 29 pairs of oligonucleotides, which formed segments 1 to 29 

when base-paired, are described below. 

i 

Plasmids carrying segments 1, 5, 9, 12, 16, 20, 24 and 27 were 
15 digested with EcoRl and BsmBI and the 170 bp fragments were isolated; 
plasmids bearing segments 2, 3, 6, 7, 10, 13, 17, 18, 21, 25, and 28 were 
digested with Bbsl and BsmBI and the 170 bp fragments were isolated; and 
plasmids bearing segments 4, 8, 11, 14, 19, 22, 26 and 29 were digested with 
EcoRl and Bbsl and the 2440 bp vector fragment was isolated. Fragments 
2 o bearing segments 1 , 2, 3 and 4 were then ligated to generate segment "A"; 

fragments bearing segments 5, 6, 7 and 8 were ligated to generate segment M B"; 
fragments bearing segments 9, 10 and 1 1 were ligated to generate segment "C"; 
fragments bearing segments 12, 13, and 14 were ligated to generate segment 
"D"; fragments bearing segments 16, 17, 18 and 19 were ligated to generate 
2 5 segment "F"; fragments bearing segments 20, 21 and 22 were ligated to 

generate segment "G"; fragments bearing segments 24, 25 and 26 were ligated 
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to generate segment "I"; and fragments' bearing segments 27, li and 29 were 

i t » 

ligated to generate, segment "J". 1 ' 

Step 2; Assembly of the 1 0 resulting Fragments from Step 

i to Three Fragments- 

' • ■ '( < 

5 Plasmids carrying the segments "A", "D" and "G" were digested with 

EcoRI and BsmBI, plasmids carrying the segments B, 1 5, 23, and'l were 1 

digested with Bbsl and BsmBI, and plasmids carrying the segments C, F, and J 

were digested with EcoRI and Bbsl. Fragments bearing segments A, B, and C 

were ligated to generate segment "K"; fragments bearing segments D, 15, and F 

10 were ligated to generate segment "O"; and fragments bearing segments G, 23, 1, 

and J were ligated to generate segment "P M . 

Step 3; Assembly of the Final 1 Three Pieces . 

The plasmid bearing segment K was digested with EcoRI and 
BsmBI, the plasmid bearing segment O was digested with Bbsl and BsmBI, 
15 and the plasid bearing segment P was digested with EcoRI and Bbsl. The three 
resulting fragments were ligated to generate segments. 

Step 4; Insertion of the Synthetic Gene in a Mammalian 

Expression Vector. 

The plasmid bearing segment S was digested with Nhel and NotI and 
20 inserted between Nhel and EagI sites of plasmid GDSlNEgl to generate 
plasmid cd51sf8b r . 

Sequencing and Correction of the Synthetic Fa ctor VIII Gene 

After assembly of the synthetic gene it was discovered that there 
were two undesired residues encoded in the sequence. One was an Arg residue 
25 at 749, which is present in the GenBank sequence entry originating from 

Genentech but is not in the sequence reported by Genentech in the literature. 
The other was an Ala residue at 146, which should have been Pro. This 

- 49 - 



BNSDOCID: <WO 98 1 2207A1 J_> 



WO 98/12207 PCT/US97/16639 

( mutation arose at an unidentified step subsequent to the sequencing of the 29 
constituent fragments. The Pro749Arg mutation was corrected by 
incorporating the desired change in a P'CR primer (ctg ctt ctg acg cgt get ggg 
gtg gcg gga gtt; SEQ ID NO:44) that included the Mlul site at position 2335 of 
5 the sequence below (sequence of Hiridlll to NotI segment) and amplifying 
between that primer and a primer (ctg ctg aaa gtc tec age tgc; SEQ ID NO:44) 
5' to the SgrAI site at 2225. The SgrAI to Mlul fragment was then inserted into 
the expression vector at the cognate sites in the vector, and the resulting correct 
sequence change verified by sequencing. The Prol46Ala mutation was 

1 0 corrected by incorporating the desired sequence change in an oligonucleotide 
(ggc agg tgc tta agg aga acg gec eta tgg cca; SEQ ID NO:46) bearing the Aflll 
site at residue 504, and amplifying the fragment resulting from PCR reaction 
between that oligo and the primer having sequence cgt tgt tct tea tac gcg tct ggg 
get cct egg ggc (SEQ ID NO: 109), cutting the resulting PCR fragment with 

15 Aflll and Avrll at (residue 989), inserting the corrected fragment into the 
expression vector and confirming the construction by sequencing. 
Construction of a Matched Native Gen e Expressing Human Factor VIII 

lacking the Central B Ppmaiin 

A matched Factor VIII B domain deletion expression plasmid having 
2 0 the native codon sequence was constructed by introducing Nhel at the 5' end of 
the mature coding sequence using primer cgc caa ggg eta gec gec acc aga aga 
tac tac ctg ggt (SEQ ID NO:47), amplifying between that primer and the primer 
att cgt agt tgg ggt tec tct gga cag (corresponding to residues 1067 to 1093 of the 
sequence shown below), cutting with Nhel and Aflll (residue 345 in the 
25 sequence shown below) and inserting the resulting fragment into an 

appropriately cleaved plasmid bearing native Factor VIII. The B domain 
deletion was created by overlap PCR using ctg tat ttg atg aga acc g, 
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(corresponding to residues 1813 to 1831 below) and caa gac tgg tgg ggt ggc att 

aaa ttg ctt t (SEQ ID NO:48), (2342 to 2372 on complement below) for the 5' 

end of the overlap, and aat gcc acc cca cca gtc ttg aaa cgc ca (SEQ ID NO:49) 

(2352 to 2380 on sequence below) and cat ctg gat att gca ggg ag (SEQ ID 

NO:50) (3145 to 3164). The products of the two individual PCR reactions were 

then mixed and reamplified by use of the outermost primers, the resulting ' 

■ i 'i 

fragment cleaved by Asp718 (Kpnl isoschizomer, 1837 on sequence below) 

i 1 

and PflMI (3 100 on sequence below), and inserted into the appropriately 
cleaved expression plasmid bearing native Factor VIII. 

The complete sequence (SEQ ID NO:41) of the native human factor 
VIII gene deleted for thle central B region is presented in Figure 12. The 
complete sequence (SEQ ID NO:42) of the synthetic Factor VIII gene deleted 

for the central B region is presented in Figure 13. 

i 

Preparation and assay of expressio n plasmids 

Twd independent plasmid isolates of the native, and four 
independent isolates of the synthetic Factor VIII expression plasmid were 
separately propagated in bacteria and their DNA prepared by CsCl buoyant 
density centrifugation followed by phenol extraction. Analysis of the 
supernatants of COS cells transfected with the plasmids showed that the 
synthetic gene gave rise to approximately four times as much Factor VIII as did 
the native gene. , 

COS cells were then transfected with 5 ug of each factor VIII 
construct per 6 cm dish using the DEAE-dextran method. At 72 hours post- 
transfection, 4 ml of fresh medium containing 10% calf serum was added to 
each plated. A sample of media was taken from each plate 12 hr later. 
Samples were tested by ELISA using mouse anti-human factor VIII light chain 
monoclonal antibody and peroxidase-conjugated goat anti-human factor VIII 



WO 98/12207 PCT/UfS97/16639 

i ' ' ■ , ■ _ * 

i 

, polyclonal antibody* Purified human plasma factor VIII was used as a 
standard. Cells transfected with the synthetic Factor VI 1 1 gene construct 
expressed 138 ± 20.2 ng/ml (equivalent ng/ml nori T deleted Factor VIII) of 
Factor VIII (n=4) while the cells transfected with the native Factor VIII gene 

5 expressed 33.5 ± 0.7 ng/ml (equivalfent ng/ml non-deleted Factor VIII) of 
Factor VIII (n=2). 

The following template oligonucleotides were used for construction 
of the synthetic Factor VIII gene. 

rl bbs 1 for (gcta) 
10 cgc gaa ttc gga aga ccc get age cgc cac 1 rl 1 
ccg ccg eta eta cct ggg cgc cgt gga get 
gtc ctg gga eta cat gca gag cga cct ggg 
cga get ccc cgt gga (SEQ ID NO:51) 

ggg gat cct cac gtc tea ggt ttt ctt gta 1 bam 
1 5 cac cac get ggt gtt gaa ggg gaa get ctt 

ggg cac gcg ggg ggg gaa gcg ggc gtc cac 
ggg gag etc gec ca (SEQ ID NO: 52) 

rl bbs 2 for (aacc) ^ 

cgc gaa ttc gga aga ccc aac cct gtt cgt 2 rl 
2 o gga gtt cac cga cca cct gtt caa cat tgc 
caa gec gcg ccc ccc ctg gat ggg cct get 
ggg ccc cac cat cca (SEQ ID NO: 53) 
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( ggg gat cct cac gtc tea gtg cag get gac 2 bam 1 
ggg gtg get ggc cat gtt ctt cag ggt gat 
cac cac ggt gtc gta cac etc ggc ctg gat ' 
ggt ggg gee cag ca (SEQ ID NO:54) 

5 rl bbs 3 for(gcac) 

cgc gaa ttc gga aga ccc gca cgc cgt ggg 3 rl 

cgt gag eta ctg gaa ggc cag cga ggg cgc 

i 

cga gta cga cga cca gac gtc cca gcg cga 
gaa gga gga cga caa (SEQ ID NO: 55) 

1( > ggg g a t cct cac gtc tea get ggc cat agg 3 bam 
gee gtt etc ctt aag cac ctg cca cac gta 
ggt gtg get ccc ccc egg gaa cac ctt gtc 
gtc etc ctt etc gc (SEQ ID NO:56) 

rl bbs 4 for (cage) 
15 cgc gaa ttc gga aga ccc cag cga ccc cct 4rl 
gtg cct gac eta cag eta cct gag cca cgt 
gga cct ggt gaa gga tct gaa cag egg get 
gat egg cgc cct get (SEQ ID NO:57) 

ggg g at cct cac gtc tea gaa cag cag gat 4 bam 
2 0 gaa ctt gtg cag ggt ctg ggt ttt etc ctt 

ggc cag get gec etc gcg aea cac cag cag 
ggc gec gat cag cc (SEQ ID NO:58) 
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rl bbs 5 for (gttc) ' ' ." . >, [ ' ' " , . 
cgc gaa ttc gga aga ccc gtt cgc cgt gtt < - 5 rl 
cga cga ggg gaa gag ctg gca cag cga gac 1 

i ■ 1 

taa gaa cag cct gat gca gga ccg cga cgc 

cgc cag cgc ccg ( cgc (SEQ ID NO:59) 

i . . . ■ , < i i 



ggg gat cct cac gtc tea gtg gca gec gat 5 bam 
cag gec ggg cag g;ct gcg gtt cac gta gec , ( 
gtt aac ggt gtg' cat ctt ggg cca ggc gcg 
ggc get ggc ggc gt (SEQ ID NO:60) 

I • 

10 rl bbs 6for(ccac) 

cgc gaa ttc gga aga ccc cca ccg caa gag ' 6 rl 
cgt gta ctg gca cgt cat egg cat ggg cac 
cac ccc tga ggt gca cag cat ctt cct gga 
ggg cca cac ctt cct (SEQ ID NO:61) 

15 ggg gat cct cac gtc tea cag ggt ctg ggc 6 bam 
agt cag gaa ggt gat ggg get gat etc cag 
get ggc ctg gcg gtg gtt gcg cac cag gaa 
ggt gtg gec etc ca (SEQ ID NO:62) 



rl bbs 7 for (cctg) 
2 0 cgc gaa ttc gga aga ccc cct get gat gga 7 rl 
cct agg cca gtt cct get gtt ctg cca cat 
cag cag cca cca gca cga egg cat gga ggc 
tta cgt gaa ggt gga (SEQ ID NO:63) 
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ggg gat cct cac gtc tea gtc gtc gtc gta 1 , 7 bam 

gtc etc ggc etc ctc.gtt gtt ctt cat gcg . . 

i 1 , ' , 1 

cag ctg ggg etc etc ggg gca get gtc cac ■ 

, ' i 

ctt cac gta age ct (SEQ ID NO:64) 

5 r} bbs 8for'(cgac) 1 j 1 

cgc gaa ttc gga aga ccc cga cct gac cga 8 rl 
cag cga gat gga tgt cgt acg ctt cga cga ' 
cga caa cag ccc cag ctt cat cca gat ccg 

cag cgt ggq caa gaa (SEQ ID NO:65) 

i 

1 0 ggg g at cct cac gtc tea tac tag egg ggc , 8 bam 
gta gtc cca gtc etc etc etc ggc ggc gat 
gta gtg cac cca ggt ctt agg gtg ctt ctt 
ggc cac get gcg ga (SEQ ID NO:66) 

rl bbs 9 for (agta) 
15 cgc gaa ttc gga aga ccc agt act ggc ccc 9 rl 
cga cga ccg cag eta caa gag cca gta cct 
gaa caa egg ccc cca gcg cat egg ccg caa 
gta caa gaa ggt gqg (SEQ ID NO:67) 

ggg gat cct cac gtc tea gag gat gee gga 9 bam 
2 0 etc gtg ctg gat ggc etc gcg ggt ctt gaa 
agt etc gtc ggt gta ggc cat gaa gcg cac 
ctt ctt gta ctt gc (SEQ ID NO:68) 
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( rl bbs 10 for(cctc) : ( ' 

cgc gaa ttc gga aga ccc cct egg ccc cct 10 rl ( 

* i 

get gta egg cga ggt ggg cga cac cct get' 
gat cat ctt caa gaa cca ggc cag cag gee 
5 eta caa cat eta ccc (SEQ ID NO: 69) 1 

ggg gat cct cac gtc tea ctt cag gtg ctt 10 bam 
cac gec ctt ggg cag gcg gcg get gta cag 
ggg gcg cac gtc ggt gat gee gtg ggg gta 
gat gtt gta ggg cc (SEQ ID NO:70) 

i 

I 

10 rl bbs 11 for(gaag) 

cgc gaa ttc gga aga ccc gaa gga ctt ccc 1 1 rl 
cat cct gee egg cga gat ctt caa gta caa 
gtg gac cgt gac cgt gga gga egg ccc cac 
caa gag cga ccc ccg (SEQ ID NO:71 ) 

1 5 ggg gat cct cac gtc tea gee gat cag tec 1 1 bam 1 
gga ggc cag gtc gcg etc cat gtt cac gaa 
get get gta gta gcg ggt cag gca gcg ggg 
gtc get ctt ggt gg (SEQ ID NO:72) 



rl bbs 12 for (egge) 
2 o cgc gaa ttc gga aga ccc egg ccc cct get 1 2 rl 
gat ctg eta caa gga gag cgt gga cca gcg 
egg caa cca gat cat gag cga caa gcg caa 
cgt gat cct gtt cag (SEQ ID NO:73) 
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! 

f ggg gat cct cac gtc tea age ggg gtt ggg 1 2 bani 
cag gaa gcg ctg gat gtt etc ggt cag ata 
cca get gcg gtt etc gtc gaa cac get gaa 1 
cag gat cac gtt gc (SEQ ID NO:74) 

5 rl bbs 13 for(cgct) ' ' . 

i 

cgc gaa ttc gga aga ccc cgc tgg cgt gca 1 3 rl 
get gga aga tec cga gtt cca ggc cag caa 
cat cat gca cag cat caa egg eta cgt gtt 
cga cag cct gca get (SEQ ID NO:75) 

t 

I 

10 ggg gat cct cac gtc tea cag gaa gtc ggt 13 bam 
ctg ggc gee gat get cag gat gta cca gta 
ggc cac etc atg cag gca cac get cag ctg 
cag get gtc gaa ca (SEQ ID NO: 76) 

rl bbs 14 for (cctg) 
1 5 cgc gaa ttc gga aga ccc cct gag cgt gtt 1 4 r 1 
ctt etc egg gta tac ctt caa gca caa gat 
ggt gta cga gga cac cct gac cct gtt ccc 
ctt etc egg cga gac (SEQ ID NO:77) 

ggg g at cct cac gtc tea gtt gcg gaa gtc 14 bam 
2 0 get gtt gtg gca gee cag aat cca cag gee 
ggg gtt etc cat aga cat gaa cac agt etc 
gee gga gaa ggg ga (SEQ ID NO:78) 
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/' ' \ ' ' ' ' t 1 

rl bbs 15 fbr(caac) ' . ' " , - 

cgc gaa ttc gga aga, ccc caa ccg egg bat • < 15! rl ( ' 
gac tgc cct get gaa agt etc cag ctg cga , < 

caa gaa cac egg cga eta eta cga gga cag 
5 eta cga gga cat etc (SEQ ID NO:79) ' ,'' 

! ' ' ' ' . " , ' ■ ' 

ggg gat cct cac gtc tea gcg gtg gcg gga 1 5 bam 
gttttgggagaaggagcggggctcgatggc ( - 
gtt gtt ctt gga cag cag gta ggc gga gat 
gtc etc gta get gt (SEQ ID NO: 80) 

10 rl bbs 16 for(ccgc) 

cgc gaa ttc gga aga ccc ccg cag cac gcg 1 6 rl 
tea gaa gca gtt caa cgc cac ccc ccc cgt 
get gaa gcg cca cca gcg cga gat cac ccg 
cac cac cct gca aag (SEQ ID NO:81) 

1 5 ggg gat cct cac gtc tea gat gtc gaa gtc 1 6 bam 
etc ctt ctt cat etc cac get gat ggt gtc 
gtc gta gtc gat etc etc ctg gtc get ttg 
cag ggt ggt gcg gg (SEQ ID NO:82) 



rl bbs 17 for(catc) 
2 o cgc gaa ttc gga aga ccc cat eta cga cga 1 7 rl 
gga cga gaa cca gag ccc ccg etc ctt cca 
aaa gaa aac ccg cca eta ctt cat cgc cgc 
cgt gga gcg cct gtg (SEQ ID NO:83) 
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ggg gat cct cac gtc tea ctg ggg cac get 1 7 bam 1 
gec get ctg ggc gcg gtt gcg cag gac gtg 
ggg get get get cat gec gta gtc cca cag 1 
gcg etc cac ggc gg (SEQ ID NO:84) 

5 rl bbs 18 for(ccag) 

cgc gaa ttc gga aga ccc cca gtt caa gaa 1 8 rl 
ggt ggt gtt cca gga gtt cac cga egg cag 
ctt cac cca gec cct gta ccg egg cga get 
gaa cga gca cct ggg (SEQ ID NO:85) 

10 ggg gat cct cac gtc tea ggc ttg gtt gcg 18 bam 
gaa ggt cap cat gat gtt gtc etc cac etc 
ggc gcg gat gta ggg gee gag cag gec cag 
gtg etc gtt cag ct (SEQ ID NO:86) 

rl bbs 19 for (agee) 
1 5 cgc gaa ttc gga aga ccc age etc ccg gee 1 9 r 1 
eta etc ctt eta etc etc cct gat cag eta 
cga gga gga cca gcg cca ggg cgc cga gee 
ccg caa gaa ctt cgt (SEQ ID NO:87) 

ggg gat cct cac gtc tea etc gtc ctt ggt 19 bam 
2 0 ggg ggc cat gtg gtg ctg cac ctt cca gaa 
gta ggt ctt agt etc gtt ggg ctt cac gaa 
gtt ctt gcg ggg ct (SEQ ID NO:88) 
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r 1 bps 20 for (cgag) , 1 , 

cgc gaa ttc gga aga ccc cga gtt cga ctg '20x1 , ( 

caa ggc ctg ggc eta ctt cag cga cgt gga * , • 1 ( 

i . ( • 

cct gga gaa gga cgt gca cag egg cct gat 

5 egg ccc cct get ggt (SEQ ID NO:89) , 

t . i ! • 1 , t i i 

ggg gat cct cac gtc tea gaa cag ggc aaa 20 bam 
ttc ctg cac agt cac ctg cct ccc gtg ggg , , , 
ggg gtt cag ggt gtt ggt gtg gca cac cag 
cag ggg gec gat ca (SEQ ID NO:90) 

i • - 
10 rl bbs 21 for(gttc) 

cgc gaa ttc gga aga ccc gtt ctt cac cat 21 rl 
ctt cga cga gac taa gag ctg gta ctt cac 
cga gaa cat gga gcg caa ctg ccg cgc ccc 
ctg caa cat cca gat (SEQ ID NO:9 1 ) 

1 5 ggg gat cct cac gtc tea cag ggt gtc cat 21 bam 
gat gta gee gtt gat ggc gtg gaa gcg gta 
gtt etc ctt gaa ggt ggg ate ttc cat ctg 
gat gtt gca ggg gg (SEQ ID NO:92) 

rl bbs 22 for (cctg) 
2 0 cgc gaa ttc gga aga ccc cct gec egg cct 22 rl 
ggt gat ggc cca gga cca gcg cat ccg ctg 
gta cct get gtc tat ggg cag caa cga gaa 
cat cca cag cat cca (SEQ ID NO:93) 
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ggg gat cct cac gtc tea gta cag gtt gta • 22 bam t 

1 ' r 

cag ggc cat ctt gta etc etc ctt ctt gcg <■ » 

cac ggt gaa aac gtg gcc get gaa gtg gat '' 

1 i - 

get gtg gat gtt ct, (SEQ ID NO:94) 

5 rl, bbs 23 for (gtac) , ' 1 

cgc gaa ttc gga aga ccc gta ccc egg cgt 23 rl 
gtt cga gac tgt gga gat get gcc cag caa , , 
ggc egg gat ctg gcg cgt j^ga gtg cct gat 
egg cga gca cct gca (SEQ ID NO:95) 

i ■ ■ ( 

10 ggg gat cct cac gtc tea get ggc cat gcc 1 23 bam 

ca S ggg ggt ctg gca ctt gtt get gta cac ' 

cag gaa cag ggt get cat gcc ggc gtg cag 

gtg etc gcc gat ca (SEQ ID NO:96) 

rl bbs 24 for (cage) 
15 cgc gaa ttc gga aga ccc cag egg cca cat 24 rl 
ccg cga ctt cca gat cac cgc cag egg cca 
gta egg cca gtg ggc tec caa get ggc ccg 
cct gca eta cag egg (SEQ ID NO:97) 

ggg gat cct cac gtc tea cat ggg ggc cag 24 bam 
2 o cag gtc cac ctt gat cca gga gaa ggg etc 
ctt ggt cga cca ggc gtt gat get gcc get 
gta gtg cag gcg gg (SEQ ID NO:98) 
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, rl bbs 25 for (catg) ' .. 

cgc gaa ttc gga aga ccc cat gat cat cca 25 rl ( • ' 

egg cat caa gac cca ggg cgc ccg cca gaa 

gtt cag cag cct gta cat cag cca gtt cat 
5 cat cat gta etc tct (SEQ ID NO:99) 1 

ggg gat cct cac gtc tea gtt gee gaa gaa 25 bam 
cac cat cag ggt gee ggt get gtt gee gcg 
gta ggt ctg cca ctt ett gee gtc tag aga 
gta cat gat gat ga (SEQ ID NO: 1 00) 

i 

10, r 1 bbs 26 for (caac) 

cgc gaa ttc gga aga ccc caa cgt gga cag 26 rl 
cag egg cat caa gca caa cat ctt caa ccc 
ccc cat cat cgc ccg eta cat ccg cct gca 
ccc cac cca eta cag (SEQ ID NO:101) 

i 

15 ggg gat cct cac gtc tea gee cag ggg cat 26 bam 
get gca get gtt cag gtc gca gee cat cag 
etc cat gcg cag ggt get gcg gat get gta 
gtg ggt ggg gtg ca (SEQ ID NO: 1 02) 



rl bbs 27 for (gggc) 
2 0 cgc gaa ttc gga aga ccc ggg cat gga gag 27 r 1 
caa ggc cat cag cga cgc cca gat cac cgc 
etc cag eta ett cac caa cat gtt cgc cac 
ctg gag ccc cag caa (SEQ ID NO: 103) 

- 62 - 



BNSDOCID: <WO 9812207A1 I > 



WO 98/12207 PCTYUS97/16639 

1 / » 



i ggg 8 at cct cac gtc tea cca etc ctt ggg 27 bam 
gtt gtt cac ctg ggg gcg cca ggc gtt get 
gcg gee ctg cag gtg cag gcg ggc ctt get 
ggg get cca ggt gg (SEQ ID NO:104) 

I 

5 rl bbs 28 for (gtgg) 

cgc gaa ttc gga aga ccc gtg get gca ggt 28 rl 
gga ctt cca gaa aac cat gaa ggt gac tgg 
cgt gac cac cca ggg cgt caa gag cct get 
gac cag cat gta cgt (SEQ ID NO: 105) 

1 0 , ggg gat cct cac gtc tea ctt gec gtt ttg 28 bam 
gaa gaa cag ggt cca ctg gtg gec gtc ctg 
get get get gat cag gaa etc ctt cac gta 
cat get ggt cag ca (SEQ ID NO: 106) 

rl bbs 29 for (caag) 
15 cgc gaa ttc gga aga ccc caa ggt gaa ggt 29 rl 
gtt cca ggg caa cca gga cag ctt cac acc 
ggt cgt gaa cag cct gga ccc ccc cct get 
gac ccg eta cct gcg (SEQ ID NO: 107) 



ggg gat cct cac gtc tea gcg gee get tea 29 bam 
20 gta cag gtc ctg ggc etc gca gee cag cac 
etc cat gcg cag ggc gat ctg gtg cac cca 
get ctg ggg gtg gat gcg cag gta gcg ggt 
cag ca (SEQ ID NO: 108) 
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The codoii usage for the native and synthetic genes described above 
are presented in Tables 3 and 4, respectively^ 

' , • ' 1 

TABLE 3: Cbdon Frequency of the Synthetic Factor 

VIII B Domain Deleted Gene, 1 

1 • • •' ,■ ' ' \ 

5 AA Codon , Number /1000 Fraction . , ' 

. ' I ' ' ' . > 

■ ■ ' i • i i 





Glv 


GGG 


7 on , 

/ » WW i 


A SO 


n no 


• 


Glv 


GGA 

VJ VJ Jy. 




ii \J.OJ 


ft Cl\ 




niv 


GfiT 

VJVJ 1 






U.UU 


10 


Gly 


GGC 


74.00 


50.93 


0.90 




Glu 


GAG 


81^00 


55.75 


0.96 




Glu 


GAA 


3.00 


2.06 


0.04 




Asp 


GAT 


4.00 


2.75 


0.05 


15 


Asp 


GAC 


78.00 


• 53.68 


0.95 




Val 


GTG 


77.00 


52.99 


0.88 




Val 


GTA 


2.00 


1.38 


0.02 




Val 


GTT 


2.00 


1.38 


0.02 


20 


Val 


GTC 


7.00 


4.82 


0.08 




Ala 


GCG 


0.00 


0.00 


0.00 




Ala 


GCA 


0.00 


0.00 


0.00 




Ala 


GCT 


3.00 


2.06 


0.04 


25 


Ala 


GCC 


67.00 


46.11 


0.96 



Arg 
"Arg 
Ser 
Ser 

Lys 
Lys 
Asn 
Asn 



AGG 
-AGA" 
AGT 
AGC 

AAG 
AAA 
AAT 
AAC 



2.00 

_ 0700" 
0.00 
97.00 

75.00 
5.00 
0.00 

63.00 



1.38 
0.00 
0.00 
66.76 

51.62 
3.44 
0.00 
43.36 



0.03 
0.00 
0.00 
0.81 

0.94 
0.06 
0.00 

1.00 
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Met ATG 





He 


ATA 

i 




He 


ATT 


5 


He 


ATC 




Thr 


ACG 




Thr 


ACA 




Thr 


ACT 


10 


Thr 


ACC 




Tip 


TGG 




End 


TGA 




Cys 


TGT 


15 


Cys 


TGC 




End 


TAG 




End 


TAA 




Tyr 


TAT 


20 


Tyr 


TAC 




Leu 


TTG 




Leu 


TTA 




Phe 


TTT 


25 


Phe 


TTC 




Ser 


TCG 




Ser 


TCA 




Ser 


TCT 


30 


Ser 


TCC 




Arg 


CGG 




Arg 


CGA 




Arg 


CGT 


35 


Arg 


CGC 



PCT/US97h6d39 

i t ) 

i 

' * . 1 ■ . ■ " ' i 

i ■ i i 1 

43.00 29.59 1.00 

0.00 0.00 0.00 

2.00 1.38 0.03 ' , 

72.00 1 49.55 0.97 

,2.00 1.38 0.02 
1.00 0.69 0.01 ' , , , , 

10.00 6.88 0.12 , , 

70:00 48.18 0.84 ' 

28.00 19.27 1.00 1 

1.00 11 0.69 1.00 
1.00, " 0.69 0.05 
18.00 12.39 0.95 

i 

o ? 6o o.oo o.oo , 

0.00 0.00 0.00 

2.00 1.38 0.03 

66.00 45.42 0.97 

0.00 0.00 0.00 

0.00 0.00 0.00 

1.00 0.69 0.01 

76.00 52.31 0.99 

1.00 0.69 0.01 

0.00 0.00 0.00 

3.00 2.06 0.03 

19.00 13.08 0.16 

1.00 0.69 0.01 
0.00 0.00 0.00 

1.00 0.69 0.01 
69.00 47.49 0.95 



Gin 


CAG 


62.00 


42.67 


0.93 


Gin 


CAA 


5.00 


3.44 


0.07 


His 


CAT 


1.00 


0.69 


0.02 


His 


CAC 


50.00 


34.41 


0.98 
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Leu 


CTG 


118.00 


81:21 


0.94 


Leu 


, GTA 


3.00 


2.06, 


0.02 


Leu 


CTT 


1.00 


0.69 


0.01 


Leu . 


CTC 


3.00 


2.06 


0.02 


Pro 


CCG 


4.00 


2.75 


0.05 


Pro 


CCA 


0.00 


0.00 


•0.00 


Pro 


CCT 


3.00 


2.06 


0.04 


Pro 


CCC 


68.00 


46.80 


0.91 
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TABLE 4: Codon Frequency Table of the Native Factor 
VIII B Domain Deleted Gene 

15 AA Codon Number /1000 Fraction 



Gly 


GGG 


12.00 


8.26 


0.15 


Gly 


GGA 


34.00 


23.40 


0.41 


Gly 


GGT 


16.00 


11.01 


0.20 


20 Gly 


GGC 


20.00 


13.76 


0.24 


Glu 


GAG 


33.00 


22.71 


0.39 


Glu 


GAA 


51.00 


35.10 


0.61 


Asp 


GAT 


55.00 


37.85 


0.67 


25 Asp 


GAC 


27.00 


18.58 


0.33 



30 



Val GTG 

_Val_GTA_ 



Val GTT 
Val GTC 



29.00 19.96 0.33 

_1_9.00 — 1-3,08 0r22- 

17.00 11.70 0.19 
23.00 15.83 0.26 



35 



Ala 


GCG 


2.00 


1.38 


0.03 


Ala 


GCA 


18.00 


12.39 


0.25 


Ala 


GCT 


31.00 


21.34 


0.44 


Ala 


GCC 


20.00 


13.76 


0.28 
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i ' i ' ' \- ■ ■ ' ' 1 ' 1 

Arg AGG, 18.00 12.39 0.25 , ,, 

Arg AGA 22.00 15.14 ,0.30 \ 

Ser AGT 22.00 15.14 0.18 

Ser AGC 24.00 1 16.52 0,20 

1 

Lys AAG 32.00 22.02 0.40 

Lys AAA , 48.00 33.04 , 0.60 , 

Asn AAT 38^00/ 26.15 0.60 ■ > , , 1 ( 

Asn AAC 25.00 17.21 0.40 ,', '. 

Met ATG 43.00 29.59 1.00 

He ATA 13.00 ' 8.95 0.18 ' ' 

He ATT '36.00 24.78 0.49 

He ATC 25.00, " 17.21 0.34 



Thr 


ACG 


1.00 


' 0.69 


0.01 


Thr 


ACA 


23. bO 


15.83 


0.28 


Thr 


ACT 


36.00 


24.78 


0.43 


Thr 


ACC 


23.00 


15.83 


0.28 


Trp 


TGG 


28.00 


19.27 


1.00 


End 


TGA 


1.00 


0.69 


1.00 


Cys 


TGT 


7.00 


4.82 


0.37 


Cys 


TGC 


12.00 


8.26 


0.63 


End 


TAG 


0.00 


0.00 


0.00 


End 


TAA 


0.00 


0.00 


0.00 


Tyr 


TAT 


41.00 


28.22 


0.60 


Tyr 


TAC 


27.00 


18.58 


0.40 


Leu 


TTG 


20.00 


13.76 


0.16 


Leu 


TTA 


10.00 


6.88 


0.08 


Phe 


TTT 


45.00 


30.97 


0.58 


Phe 


TTC 


32.00 


22.02 


0.42 


Ser 


TCG 


2.00 


1.38 


0.02 


Ser 


TCA 


27.00 


18.58 


0.22 


Ser 


TCT 


27.00 


18.58 


0.22 


Ser 


TCC 


18.00 


12.39 


0.15 



40 
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( 1 



5 



15 



Arg 


CGG 


6.00 


4.13 


0.08 


Arg 


CGA 


10.00 


6.88 


0.14 


Arg 


CGT 


7.00 


4.82, 


0.10 


Arg 


LuL 


1 l\ c\t\ 


o.oo 


■ i 
0.14 


Gin 


CAG 


42.00 


28.91 • 


0.63 


Gin 


CAA 


25.00 


17.21 


0137 


His 


CAT 


28.00 


19.27 


0.55 


His 


CAC 


*\ 1 t\i\ 

23.00 


15.83 


0.45 


Leu 


CTG 


36.00 


24.78 


0.29 


Leu 


CTA 


15.00 


10.32 


0.12 


Leu 


CTT 


24.00 


16.52 


0.19 


Leu 


CTC 


20.00 


13.76 


0.16 


Pro 


CCG 


1.00 


0.69 


0.01 


Pro 


CCA 


32.00 


22.02 


0.43 


Pro 


CCT 


26.00 


17.89 


0.35 


Pro 


CCC 


15.00 


10.32 


0.20 



2 0 

Use 

The synthetic genes of the invention are useful for expressing the a 
protein normally expressed in mammalian cells in cell culture (e.g. for 
commercial production of human proteins such as hGH, TPA, Factor VIII, and 

2 5 Factor IX). The synthetic genes of the invention are also useful for gene 
therapy. For example, a synthetic gene encoding a selected protein can be 
introduced in to a cell which can express the protein to create a cell which can 
be administered to a patient in need of the protein. Such cell-based gene 
therapy techniques are well known to those skilled in the art, see, e.g., 

30 Anderson, et al., U.S. Patent No. 5,399,349; Mulligan and Wilson, U.S. Patent 
No. 5,460,959. 

What is claimed is: 
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i 1 • * i ■ i 

, 1 . A synthetic gene encoding a protein normally expressed in an 

eukaryptic cell wherein at least one non-preferred or less preferred codon in a 
natural gene encoding said protein has been replaced by a preferred codon 
encoding the same amino acid, said synthetic gene being capable of expressing 
5 said protein at a level which is at least 1 10% of that expressed by said natural 
gene in an in vitro mammalian cell culture system under identical conditions. 

2. The synthetic gene of claim 1 wherein said synthetic gene is 

i 

capable of expressing said protein at a level which is at least 150% of that 
expressed by said natural gene in an in vitro cell culture system under identical 
10 conditions. 

1 3. The synthetic gene of claim 1 wherein said synthetic gene is 
capable of expressing said protein at a level which is at least 200% of that 
expressed by said natural gene in an in vitro cell culture system under identical 
conditions. 

i 

15 4. The synthetic gene of claim 1 wherein said synthetic gene is 

capable of expressing said protein at a level which is at least 500% of that 
expressed by said natural gene in an in vitro cell culture system under identical 
conditions. 

5. The synthetic gene of claim 1 wherein said synthetic gene 
20 comprises fewer than 5 occurrences of the sequence CG. 

6. The synthetic gene of claim 1 wherein at least 10% of the codons 
in said natural gene are non-preferred codons. 
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7. The synthetic gene of claim 1 wherein at least 50% of the codons 

in said natural gene are non-preferred codons/. 

» i ,i * * i 

• 1 

i ' • 

8. The synthetic gene of claim 1 wherein at least 50% of the non- 

preferred codons and less preferred codons present in said natural gene have 

5 been replaced by preferred codons. 1 , ! 

i *» 

. ' i i 

9. The synthetic gene of claim 1 , therein at least 90% of the non- 
preferred codons and less preferred codons present in said natural gene have 

been replaced by preferred codons. 

t 

i ■ • 

i ' 

10. The synthetic gene of claim 1 wherein said protein is normally . 
10 expressed by a mammalian cell. 

1 1 . The synthetic gene of claim 1 wherein said protein is a retroviral 

protein. 

12. The synthetic gene of claim 1 wherein said protein is a lentiviral 

protein. 

15 13. The synthetic gene of claim 1 1 wherein said protein is an HIV 

protein. 

14. The synthetic gene of claim 13 wherein said protein is selected 
from the group consisting of gag, pol, and env. 

15. The synthetic gene of claim 13 wherein said protein is gpl20. 
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1 6 r The synthetic gene of claim 13 wherein said protein is gpl60. 



17: 



The synthetic gene of claim 1 wherein said protein is a human 



protein. 



18. 



The synthetic gene of claim 1 wherein said human protein is 



5 



Factor VIII. 



19. 



The synthetic gene of claim 1 wherein 20% of the codons are 



preferred codons. 



20. The synthetic gene of claim 18 wherein said gene has the 



coding sequence present in SEQ ID NO:42. 

10 21. The synthetic gene of claim 1 wherein said protein is green 

fluorescent protein. 

22. The synthetic gene of claim 20 wherein said synthetic gene is 
capable of expressing said green fluorescent protein at a level which is at least 
200% of that expressed by said natural gene in an in vitro mammalian cell 
15 culture system under identical conditions. 



capable of expressing said green fluorescent protein at a level which is at least 
1000% of that expressed by said n atural gene in an in vitro mammalian cell 
culture system under identical conditions. 



23. The synthetic gene of claim 20 wherein said synthetic gene is 
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i 24. The synthetic gene of claim 21 having the sequence depicted 

in'Figure. 1 1 (SEQ ID NO:40). 

25. An expression vector comprising the synthetic gene of 

claim 1. 

i i 

5 26. The expression vector of claim 2 1 said expression vector 

being a mammalian expression vector. 

27. A mammalian cell harboring with the synthetic gene of 
claim 1. ' 

28. A method for preparing a synthetic gene encoding a protein 

1 0 normally expressed by mammalian cells, comprising identifying non-preferred 
and less-preferred codons in the natural gene encoding said protein and 
replacing one or more of said non-preferred and less-preferred codons with a 
preferred codon encoding the same amino acid as the replaced codon. 
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I CTCGAGATCC ATTGTGCTCT AAAGGAGATA CCCCGCCACA CACCCTCACC 



51' 


TGCGG7GCCC 


AGCTCCCCAG 


GCTGAGGCAA 


GAGAAGGCCA 


GAAACCATGC 


101 


CCATGGGGTC 


TCTGCAACCG 


CTGGCCACCT 


TGTACCTGCT 


GGGGATGCTG 


151 


GTCGCTTCCG 


TGCTAGCCAC 


CGAGAAGCTG 


TGGCTGACCG 


TGTACTACGG 


201 


CGTGCCCGTG 


TC-GAAGGAGG 


'cCACCACCAC 


CCTCTTCTGC 


GCCAGCGACG 


251 


CCAAGGCGTA 


CGACACCGAG 

i 


GTGCACAACG 


TG7CCCCCAC 


CCAGGCGTGC 


301 


GTGCCCACCG 


ACCCCAACCC 


CCAGGAGGTG 


GAGCTCGTGA 


ACGTGACCGA 


351 


GAACTTCAAC 


ATGTCGAAGA 


ACAACATGGT 


GGAGCAGATG 


CATGAGGACA 


401 


TCATCAGCCT 


GTGGGACCAG 


AGCCTGAAGC 


CCCGCGTGAA 


GCTGACCCCC 


451 


CTGTGCGTGA 


C CCTGAACTG 


CACCGACCTC 


AGGAACACCA 


CCAACACCAA 


5C1 


CAACACCACC 


CjCCAACAACA 


ACAGC AACAG CGAGGCCACC 


ATCAAGGGCG 


551 


G C G AG ATG AA 


CAACTGCAGC 


TTCAACATCA 


CCACCAGCAT 


CCGCGACAAG 


601 

i 


ATGCAGAAGG. 


AGTACGCCCT 


GCTCTACAAG 


CTGGATATCG 


TGAGCATCGA 


651 


CAACGACAGC 


ACCAGCTACC 


GCCTGATCTC 


CTGCAACACC 


AGCGTGATCA 


701 


CCCAGGCCTG 


QCCCAAGATC 


AGCTTCG AG C 


CCATCCCCAT 


CCACTACTGC 


751 


GCCCCCGCCG 


CjGTTCGCCAT 


CCTGAAGTGC 


AACGACAAGA 


AGTTCAGCCG 


801 


CAAGGGCAGC 


TGCAAGAACC 


TGAGCACCGT 


GCAGTGCACC 


CACGGCATCC 


851 


GGCCGGTGGT 


GAGCACCCAG 
i 


CTCCTGCTGA 


ACGGCAGCCT 


GGCCGAGGAG 


9C1 


GAGGTGG7CA 


TCCGCAGCGA 


GAACTTCACC 


GACAACGCCA 


AGACCATCAT 


951 


CG7GCACCTG 


AATGAGAGCG 


TGCAGATCAA 


CTCCACGCGT 


CCCAACTACA 


1001 


ACAAGCGCAA 


QCGCATCCAC 


ATCGGCCCCC 


GGCCCGCCTT 


CTACACCACC 


1C51 


AAG AACATC A 


TCGGCACCAT 


CCGCCAGGCC 


CACTGCAACA 


TCTCTAGAGC 


1101 


CAAGTGGAAC 


GACACCCTGC 


GCCAGATCGT 


GAGCAAGCTG 


AAGG AGCAGT 


1151 


TCAAGAACAA 


CACCATCGTG 


TTCAACCAGA 


GCAGCGGCGG 


CGACCCCGAC 


120 1 


A.CGiGATGC 


ACAGCTTCAA 


CTGCGGCGGC 


GAATTCTTCT 


ACTGCAACAC 


1251 


CAGCCCCCTG 


TTCAACAGCA 


CCTGGAACGG 


CAACAACACC 


TGGAACAACA 


1301 


CCACCGCCAG 


CAACAACAAT 


ATTACCCTCC 


AGTGCAAGAT 


CAAGCAGATC 


1351 


ATCAACATGT 


GGCAGGAGGT 


GGCCAAGGCC 


ATGTACGCCC 


CCCCCATCGA 


1401 


GGGCCACATC 


CGGTGCAGCA 


GC AACATC AC 


CGGTCTGCTG 


CTGACCCGCG 


1431 


ACSCCGGCAA 


GGACACCGAC 


ACCAACGACA 


CCGAAATCTT 


CCGCCCCGGC 
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1501 GCGGCCCAC A TGCGCGACAA CTCGApATCT G AGCTGTACA ' AGTACAAGGT 

1 : I (I ' | i 

1SS1 GGTGACGATC ^G.CCCCTGG GCGTGGCCCC CACCAAGGCC AAGCGCCCCG 

1601 TCGTCCACCG CGACAAGCGC TAAAGCGGCC CC (SEQ ID NO: 34) 
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,1 ACCCAGAACC 


1 t 


CGTGTACTAC GGCGTGCCCG 


TGTGGAAGGA 


31 


GCCCACCACC 


A'.CCT wl. i i 


GCdcCAGCGA C.GCGAAGGCG TACGACACCG 


1CI 


AGGTCCACAA 


C'3TGTGGGCC 


ACCCAGGCGT CCGTGCCCAC 


CGACCCCAAC 


IS I 


CCCCACCACC 


TGCACCTCST 


GAACGTGAC.C GAGAACTTCA 

1 


ACATGTGGAA 


1 201 


GAACAACATC 


CTGGACCAvjA 


TGCATGAGGA CATCATCAGC 


CTGTGGGACC 


251 


AGAGCCTGAA 


C|C CCTG GGTG 


AAGCTGACCC CCCTGTGCGT 


GACCCTCAAC 


301 


7GCACCCACC 


TGAGGAACAC 


CACCAACACC AACAACAGCA 


CCCCCAACAA 


2 51 


C A A C AGC Asm. 


AGCG AGGGC A 


CCATCAAGGG CGGCGAGA7G 


AAGAACTGCA 


401 


fUtTTCAACAT 


CXCCACCAGC 


A7CCGCGACA ACA7CCACAA 


GGAGTACCCC 


s5 V 


C7GCTG7ACA 


AGCTGGATAT 


CGTGAGCATC CACAACCACA 


UCAjCCACCTA 


SOI 


CCGCCTGATC 


TCCTGCAACA 


CC AG CCTG AT CACCCACCCC 


TCCCCCAACA 

1 


551 


7CAGCTTCCA 


^CCCATCCCC 


ATCCACTACT CCGCCCCCGC 


CGGCTTCGCC 


601 


A7CCTGAACT 


CpAACCACAA 


CAAG7TCAGC GGCAAGGGCA 


CCTGCAAGAA 


Q31 


TSTCACCACC 


:;tgcagtgca 


C CCACCGC AT CCGGCCGCTG 


CTGAGCACCC 


7C 1 


ACCTCCTGCT 


QAACGCCACC 


CTCCCCGAGG AGGAGGTGGT 


GATCCGCAGC 






CCCACAACGC 


CAAGACCATC A7CGTGCACC 


TGAATGAGAG 




CGTGCAGATC 


AACTGCACGC 


CTCCCAACTA CAACAAGCGC 


AAGCGCATCC 


« J — 


ACATCGCCCC 


CGGGCGCGCC 


77CTACACCA CCAAGAACAT 


CATCGGCACC 




ATCCGCCACG 


CCCACTGCAA 


r i Tr~rr"T Art A Gd CI AAGTGG A 


ACGACACCC7 


9 51 


GCGCCAGATC 


^TGAGCAAnC 




AAG XCCXTCC 


1001 


TCTTCAACCA 


GjAGCAGCCGC 


^.rrcrc. ArtATCGTGAT 


GC AC AG CTTC 


10 51 


AACTGCGGCG 


GpGAATTCTT 


CTACTGCAAC ACCACCCCCC 


TG TTC AAC AG 


1101 


CACCTGGAAC QGCAACAACA 


CCTGGAACAA CACCACCCCC 


ACCAACAACA 


1151 


A7ATTACCCT 


CCAGTGCAAC 


ATCAAGCAGA TCATCAACAT 


GTGGCAGCAC 


1201 


GTGGGCAAGG 


CCA7G7ACCC 


CCCCCCCATC G ACGGCCAG A 


TCCCCTCCAC 


us: 


CACCAACATC 


ACCCCTCTGC 


TCCTGACCCG CGACGGCGCC 


AAGGACACCG 


UCl 


ACACCAAC'JA 


, CACCGAAATC 


TTCCCCCCCG GCGGCGC-CG* 


CATGCGCGAC 


1351 


AAJCTGG AG AT 


■ CTCAGCTCTA 


. CAAGTACAAG GTGGTGACGA 


TCGAGCCCCT 


1401 


COGCGTCCCC 


; CCCACCAAGG 


; CCAAGCGCCG CCTGGTGCAG 


CGCGAGAAGC 
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14 SI GGGCCGCCAT CSGCGCCC7C TTCCTGCGCT TCCTGGCCCC CGCCCCCAGC 

1S01 ACCATGGGCG CCGCCAGCGT. GACCCTGACC GTGCAGGCCC GCCTCCTCCT ' 

L5SI GAGCGGCATC CjTGCAGCAGC AGAACAACCT CCTCCGCGCC ATCGAGGCCC 

1601 AGC AG CAT AT QiTTCCAGCTC ACCGTGTGCG GCA7CAAGCA GCTCCAGGCC ' 

i . ' 1 • ' 1 

1651 CGCG7GC7GC COGTGGAGCC CTACCTGAAG GACCAGCAGC TCCTGGGCTT ' ' 

, .1701 CTGGCGCTCC TCCGGCAAGC TGATC7CCAC CACCACGG7A C.CCTGGAACG 

1751 CCTCCTGGAG C^ACAAGAGC CTGGACGACA TCTGGAACAA CATGACC7CG ' 

13C1 ATGCAG7G.GG ACCCCGAGAT CCATAACTAG ACCAGCC7GA TCTACAGCC7 

19S1 C-CTGGAGAAG AjC.CAGACCC . ACCAGGAGAA GAACGAGCAG GAGCTGCTGG 

1901 ACCTGCACAA CTGGGCGAGC C7C7GGAAC7 GGT~CGACAT CACCAAC7GG 

19 SI C7G7GG7ACA PCAAAA7C77 CATCATGAT7 GTGCGCGCCC TGGTGGGCCT 

2001 CCGCA7CGTG T7CGCCGTGC TGAGCATCG7 CAACCCCGTG CGCCAGGGCT 

2051 ACAGCCCCCT GAGCCTCCAG ACCCGGCCCC CCG7GCCGCG CGGGCCCCAC 

1 

21CI CGCCCCGACG qCATCGAGGA GGAGGGCGGC GAGCGCGACC GCGACACCAG 

2151 CGGCAGGCTC QTGCACGCCT TCCTCGCGAT CATCTGGGTC GACCTCCGCA 

2201 GCCTGTTCCT ^TTCAGCTAC CACCACCGCG ACCTGCTGC7 GATCGCCGCC 

22 51 CGCATCGTCG AACTCCTAGG CCGCCGCGGC TCGGAGGTGC TGAAGTACTG 
2301 GTGGAACCTC CTCCAGTATT GGAGCCAGGA GCTGAACTCC AGCGCCGTGA 

23 51 GCCTGCTGAA CGCCACCGCC ATCGCCGTGG CCGAGGGCAC CGACCGCGTG 
2C01 ATCGAGGTGC TCCAGAGGGC CGGGAGGGCG ATCCTGCACA TCCCCACCCG 

24 51 CATCCGCCAG CCGCTCGAGA GGGCCCTGCT G (SEQ ID NO: 35) 
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1 GAATTCACGC GTAAGCTTGC CGCCACCATG GTGAGCAAGG GCGAGGAGCT 
51 GTTCACCGGG GTGGTGCCCA TCCTGGTCGA GCTGGACGGC GACGTGAACG 
101 GCCACAAGTT CAGCGTGTCC GGCGAGGGCG AGGGCGATGC CACCTACGGC 

GGCAAGCTGC . CCGTGCCCTG 
201 GCCCACCCTC , GTGACCACCT TCAGCTACGG CGTGCAGTGC TTCAGCCGCT 
251 ACCCCGACCA CATGAAGCAG CACGACTTCJ TCAAGTCCGC CATGCCCGAA 
301 GGCTACGTCC AGGAGCGCAC CATCTTCTTC AAGGACGACG GCAACTACAA 
351 GACCCGCGCC ' GAGGTGAAGT TCGAGGGCGA CACCCTGGTG AACCGCATCG 
401 AGCTGAAGGG ) CATCGACTTC AAGGAGGACG GCAACATCCT GGGGCACAAG 
451 CTGGAGTACA ACTACAACAG CCACAACGTC TATATCATGG CCGACAAGCA 
501 GAAGAACGGC ATCAAGGTGA ACTTCAAGAT CCGCCACAAC ATCGAGGACG 
551 GCAGCGTGCA GCTCGCCGAC CACTACCAGC AGAACACCCC CATCGGCGAC 
601 GGCCCCGTGC TGCTGCCCGA CAACCACTAC CTGAGCACCC AGTCCGCCCT 
651 GAGCAAAGAC CCCAACGAGA AGCG CGATCA CATGGTCCTG CTGGAGTTCG 
701 TGACCGCCGC CGGGATCACT CACGGCATGG ACGAGCTGTA CAAGTAAAGC 
751 GGCCGCGGAT CC (SEQ ID NO: 40) 
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Native Factor VIII B domain deleted gene aegment inserted | in the 

expr Bsion vector ' ,*' , i ( 

1 AAGCTTAAAC, CATGCCCATG GGGTCTCTGC , AACCGCTGGC CACCTTGTAC 

51 CTGCTGGGGA TGCTGGTCGC TTCCGTGCTA GCCGCCACCA GAAGATACTA 

101 CCTGGGTGGA GTGGAACTGT CATGGGACTA TATGCAAAGT GATCTCGGTG 

151 AGCTGCCTGT GGACGCAAGA TTTCCTCCTA GAGTGCCAAA ATCTTTTCCA 

201 TTCAACACCT CAGTCGTGTA CAAAAAGACT CTGTTTGTAG AATTCACGGA 

251 TCACCTTTTC AACATCGCTA AGCCAAGGCC ACCCTGGATG GGTCTGCTAG 

301 GTCCTACCAT CCAGGCTGAG GTTTATGATA CAGTGGTCAT TACACTTAAG 

351 AACATGGCTT CCCATCCTGT CAGTCTTCAT, GCTGTTGGTG TATCCTACTG 

401 GAAAGCTTCT GAGGGAGCTG ' AATATGATGA TCAGACCAGT CAAAGGGAGA 

451 AAGAAGATGA TAAAGTCTTC CCTGGTGGAA GCCATACATA TGTCTGGCAG ' 

501 GTCCTGAAAG AGAATGGTCC AATGGCCTCT GACCCACTGT GCCTTACCTA 

551 CTCATATCTT TCTCATGTGG ACCTGGTAAA AGACTTGAAT TCAGGCCTCA 

601 TTGGAGCCCT ACTAGTATGT AGAGAAGGGA GTCTGGCCAA GGAAAAGACA 

651 CAGACCTTGC ACAAATTTAT ACTACTTTTT ,GCTGTATTTG ATGAAGGGAA t 

701 AAGTTGGCAC TCAGAAACAA AGAACTCCTT GATGCAGGAT AGGGATGCTG 

751 CATCTGCTCG GGCCTGGCCT AAAATGCACA CAGTCAATGG TTATGTAAAC 

801 AGGTCTCTGC CAGGTCTGAT TGGATGCCAC AGGAAATCAG TCTATTGGCA 

851 TGTGATTGGA ATGGGCACCA CTCCTGAAGT GCACTCAATA TTCCTCGAAG 

901 GTCACACATT TCTTGTGAGG AACCATCGCC AGGCGTCCTT GGAAATCTCG 

951 CCAAT AACTT \ TCCTTACTGC TCAAACACTC TTGATGGACC TTGGACAGTT 

1001 TCTACTGTTT TGTCATATCT CTTCCCACCA ACATGATGGC ATGGAAGCTT 

1051 ATGTCAAAGT AGACAGCTGT CCAGAGGAAC CCCAACTACG AATGAAAAAT 

1101 AATGAAGAAG CGGAAGACTA TGATGATGAT CTTACTGATT CTGAAATGGA 

1151 TGTGGTCAGG TTTGATGATG ACAACTCTCC TTCCTTTATC CAAATTCGCT 

1201 CAGTTGCCAA GAAGCATCCT AAAACTTGGG TACATTACAT TGCTGCTGAA 

1251 GAGGAGGACT GGGACTATGC TCCCTTAGTC CTCGCCCCCG ATGACAGAAG 

1301 TTATAAAAGT CAATATTTGA ACAATGGCCC TCAGCGGATT GGTAGGAAGT , 

1351 ACAAAAAAGT CCGATTTATG G CAT AC AC AG ATGAAACCTT TAAGACTCGT 

1401 GAAGCTATTC AGCATGAATC AGGAATCTTG GGACCTTTAC TTTATGGGGA 

1451 AG TTGGAG AC ACACTGTTGA TTATATTTAA GAATCAAGCA AGCAGACCAT 

1501 ATAACATCTA CCCTCACGGA ATCACTGATG TCCGTCCTTT GTATTCAAGG 

1551 AGATTACCAA AAGGTGTAAA ACATTTGAAG GATTTTCCAA TTCTGCCAGG 

1601 AGAAATATTC AAATATAAAT GGACAGTGAC TGTAGAAGAT GGGCCAACTA 

1651 AATCAGATCC TCGGTGCCTG ACCCGCTATX ACTCTAGTTT CGTTAATATG 

1701 GAGAGAGATC TAGCTTCAGG ACTCATTGGC CCTCTCCTCA TCTGCTACAA 

1751 AGAATCTGTA GATCAAAGAG GAAACCAGAT AATGTCAGAC AAGAGGAATG 

1801 TCATCCTGTT TTCTGTATTT GATGAGAACC GAAGCTGGTA CCTCACAGAG 

1851 AATATACAAC GCTTTCTCCC CAATCCAGCT GGAGTGCAGC TTGAGGATCC 

1901 AGAGTTCCAA GCCTCCAACA TCATGCACAG CATCAATGGC TATGTTTTTG 

1951 ATAGTTTGCA GTTGTCAGTT TGTTTGCATG AGGTGGCATA CTGGTACATT 

2001 CTAAGCATTG GAGCACAGAC TGACTTCCTT TCTGTCTTCT TCTCTGGATA 

2051 TACCTTCAAA CACAAAATGG TCTATGAAGA CACACTCACC CTATTCCCAT 

2101 TCTCAGGAGA AACTGTCTTC ATGTCGATGG AAAACCCAGG TCTATGGATT 

2151 CTGGGGTGCC ACAACTCAGA CTTTCGGAAC AGAGGCATGA CCGCCTTACT 

2201 GAAGGTTTCT AGTTGTGACA AGAACACTGG TGATTATTAC GAGGACAGTT 

2251 ATGAAGATAT TTCAGCATAC TTGCTGAGTA AAAACAATGC CATTGAACCA 

2301 AGAAGCTTCT CCCAGAATTC AAGACACCCT AGCACTAGGC AAAAGCAATT 

2351 TAATGCCACC CCACCAGTCT TGAAACGCCA TCAACGGGAA ATAACTCGTA 

2401 CTACTCTTCA GTCAGATCAA GAGGAAATTG ACTATGATGA TACCATATCA 

2451 GTTGAAATGA AGAAGGAAGA TTTTGACATT TATGATGAGG ATGAAAATCA 

2501 GAGCCCCCGC AGCTTTCAAA AGAAAACACG ACACTATTTT ATTGCTGCAG 

2551 TGGAGAGGCT CTGGGATTAT GGGATGAGTA GCTCCCCACA TGTTCTAAGA 

2601 AACAGGGCTC AG AGTGG C AG TGTCCCTCAG TTCAAGAAAG TTGTTTTCCA 

2651 GGAATTTACT GATGGCTCCT TTACTCAGCC CTTATACCGT GGAGAACTAA 

2701 ATGAACATTT GGGACTCCTG GGGCCATATA TAAGAGCAGA AGTTGAAGAT 
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2-75-1 AATATCATGG TAACTTTCAG AAATCAGGCC TCTCGTCCCT ATTCCTTCTA' 

2801 TTCTAGCCTT ATTTCTTATG AGGAAGATCA GAGGCAAGGA GCAGAACCTA 

2851 GAAAAAACTT TGTCAAGCCT AATGAAACGA AAACTTACTT TTGGAAAGTG 

2901 CAACATCATA TGGCACCCAC TAAAGATGAG TTTGACTGCA AAGCCTGGGC 

2951 i TTATTTCTCT GATGTTGACC TGGAAAAAGA TGTGCACTCA GGCCTGATTG 

3001 GACCCCTTCT GGTCTGCCAC ACTAACACAC TGAACCCTGC TCATGGGAGA 

3051 CAAGTGACAG TACAGGAATT TGCTCTGTTT TTCACCATCT TTGATGAGAC 

3101 CAAAAGCTGG TACTTCACTG AAAATATGGA AAGAAACTGC AGGGCTCCCT 

3151 GCAATATCCA GATGGAAGAT CCCACTTTTA AAGAGAATTA TCGCTTCCAT 

3201 GCAATCAATG GCTACATAAT . GG ATACACT A CCTGGCTTAG TAATGGCTCA 

3251 GG AT C AAAGG ATTCGATGGT ATCTGCTCAG CATGGGGAGC AATGAAAACA 

3301 TCCATTCTAT TCATTTCAGT GGAGATGTGT TCACTGTACG AAAAAAAGAG 

3351 GAGTATAAAA TCGCACTGTA CAATCTCTAT CCAGGTGTTT TTGAGACAGT 

3401 GGAAATGTTA CCATCCAAAG CTGGAATTTG GCGGGTGGAA TGCCTTATTG 

3451 GCG AGCATCT ACATGCTGGG ATGAGCACAC TTTTTCTGGT GTACAGCAAT 

3501 AAGTGTCAGA CTCCCCTGGG AATGGCTTCT GGACACATTA GAGATTTTCA 

3551 GATTACAGCT TCAGGACAAT ATGGACAGTG GGCCCCAAAG CTGGCCAGAC 

3601 TTCATTATTC CGGATCAATC AATGCCTGGA GCACCAAGGA GCCCTTTTCT 

3651 TGGATCAAGG TGGATCTGTT GGCACCAATG ATTATTCACG G CATC AAGAC 

3701 CCAGGGTGCC CGTCAGAAGT TCTCCAGCCT CTACATCTCT CAGTTTATCA 

3751 TCATGTATAG TCTTGATGGG AAGAAGTGGC AGACTTATCG AGGAAATTCC 

3801 ACTGGAACCT TAATGGTCTT CTTTGGCAAT GTGG ATT CAT CTGGGATAAA 

3851 ACACAATATT TTTAACCCTC CAATTATTGC TCGATACATC CGTTTGCACC 

3901 CAACTCATTA TAGCATTCGC AGCACTCTTC GCATGGAGTT GATGGGCTGT 

3951 GATTTAAATA GTTGCAGCAT GCCATTGGGA ATGGAGAGTA AAGCAATATC ' 

4001 AGATGCACAG ATTACTGCTT CATCCTACTT TACCAATATG TTTGCCACCT 

4051 GGTCTCCTTC AAAAG CTCG A CTTCACCTCC AAGGGAGGAG TAATGCCTGG 

4101 AGACCTCAGG TGAATAATCC AAAAG AGTGG CTGCAAGTGG ACTTCCAGAA 

4151 GACAATGAAA GTCACAGGAG TAACTACTCA GGGAGTAAAA TCTCTGCTTA 

4201 CCAGCATGTA TGTGAAGGAG TTCCTCATCT CCAGCAGTCA AGATGGCCAT 

4251 CAGTGGACTC TCTTTTTTCA GAATGGCAAA GTAAAGG TTT TTCAGGGAAA 

4301 TCAAGACTCC TTCACACCTG TGGTGAACTC TCTAGACCCA CCGTTACTGA 

4351 CTCGCTACCT TCGAATTCAC CCCCAGAGTT GGGTGCACCA GATTGCCCTG 

4401 AGGATGGAGG TTCTGGGCTG CGAGGCACAG GACCTCTACT GAGGGTGGCC 

4451 ACTGCAGCAC CTGCCACTGC CGTCACCTCT CCCTCCTCAG CTCCAGGGCA 

4501 GTGTCCCTCC CTGGCTTGCC TTCTACCTTT GTGCTAAATC CTAGCAGACA 

4551 CTGCCTTGAA GCCTCCTGAA TTAACTATCA TCAGTCCTGC ATTTCTTTGG 

4601 TGGGGGGCCA GGAGGGTGCA TCCAATTTAA CTTAACTCTT ACCGTCGACC 

4651 TGCAGGCCCA ACGCGGCCGC 
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Synthetic Factor VIII B domain delet d gene segment inserted in the 
expression vector 

1 AAGCTTAAAC CATGCCCATG GGGTCTCTGC AACCGCTGGC CACCTTGTAC 
51 i CTGCTGGGGA TGCTGGTCGC TTCCGTGCTA GCCGCCACCC GCCGCTACTA 
101 CCTGGGCGCC GTGGAGCTGT CCTGGGACTA CATGCAGAGC GACCTGGGCG 
151 AGCTCCCCGT GGACGCCCGC TTCCCCCCCC GCGTGCCCAA GAGCTTCCCC 
201 TTCAACACCA GCGTGGTGTA CAAGAAAACC CTGTTCGTGG AGTTCACCGA 
251 CCACCTGTTC AACATTGCCA AGCCGOGCCC CCCCTGGATG GGCCTGCTGG 
301 GCCCCACCAT CCAGGCCGAG GTGTACGACA CCGTGCTCAT CACCCTGAAG 
351 AACATGGCCA GCCACCCCGT ,CACCCTGCAC GCCGTGGGCG TGAGCTACTG 
401 GAAGGCC AG C GAGGGCGCCG AGTACGACGA CCAGACGTCC CAGCGCGAGA 
451 AGGAGGACGA CAAGGTGTTC CCGGGGGGGA GCCACACCTA CGTGTGGCAG 
501 GTGCTTAAGG AGAACGGCCC TATGGCCAGC GACCCCCTGT GCCTGACCTA 
551 CAGCTACCTG AG CCACGTGG ACCTGGTGAA GGATCTGAAC AGCGGGCTGA 
601 TCGGCGCCCT GCTGGTGTGT CGCGAGGGCA GCCTGGCCAA GGAGAAAACC 
651 CAGACCCTGC ACAAGTTCAT CCTGCTGTTC GCCGTGTTCG ACGAGGGGAA 
701 GAGCTGGCAC AG CG AG ACTA AGAACAGCCT GATGCAGGAC CGCGACGCOG 
751 CCAGCGCCCG CGCCTGGCCC AAGATGCACA CCGTTAACGG CTACGTGAAC 
801 CGCAGCCTGC CCGGCCTGAT CGGCTGCCAC CGCAAGAGCG TGTACTGGCA 
851 CGTCATCGGC ATGGGCACCA CCCCTGAGGT GCACAGCATC TTCCTGGAGG 
901 GCCACACCTT CCTGGTGCGC AACCACCGCC AGGCCAGCCT GGAGATCAGC 
951 CCCATCACCT TCCTGACTGC CCAGACCCTG CTGATGGACC TAGGCCAGTT 
1001 CCTGCTGTTC TGCCACATCA GCAGCCACCA GCACGACGGC ATGGAGGCTT 
1051 ACGTGAAGGT GGACAGCTGC CCCGAGGAGC CCCAGCTGCG CATGAAGAAC 
1101 AACGAGGAGG CCGAGGACTA CGACGACGAC CTGACCGACA GCGAGATGGA 
1151 TGTCGTACGC TTCGACGACG ACAACAGCCC CAGCTTCATC CAGATCCGCA 
1201 GCGTGGCCAA GAAGCACCCT AAGACCTGGG TGCACTACAT CGCCGCCGAG 
1251 GAGGAGGACT GGGACTACGC CCCGCTAGTA CTGGCCCCCG ACGACCGCAG 
1301 CTACAAGAGC CAGTACCTGA ACAACGGCCC CCAGCGCATC GGCCGCAAGT 
1351 ACAAGAAGGT GCGCTTCATG GCCTACACCG ACGAGACTTT CAAGACCCGC 
1401 GAGGCCATCC AGCACGAGTC CGGCATCCTC GGCCCCCTGC TGTACGGCGA 
1451 GGTGGGCGAC ACCCTGCTGA TCATCTTCAA GAACCAGGCC AGCAGGCCCT 
1501 ACAACATCTA CCCCCACGGC ATCACCGACG TGCGCCCCCT GTACAGCCGC 
1551 CGCCTGCCCA AGGGCGTGAA GCACCTGAAG GACTTCCCCA TCCTGCCCGG 
1601 CGAGATCTTC AAGTACAAGT GGACCGTGAC CGTGGAGGAC GGCCCCACCA 
1651 AGAGCGACCC CCGCTGCCTG ACCCGCTACT ACAGCAGCTT CGTGAACATG 
1701 GAGCGCGACC TGGCCTCCGG ACTGATCGGC CCCCTGCTGA TCTGCTACAA 
1751 GGAGAGCGTG GACCAGCGCG GCAACCAGAT CATG AG CG AC AAGCGCAACG 
1801 TGATCCTGTT CAGCGTGTTC GACGAGAACC GCAGCTGGTA TCTGACCGAG 
1851 AACATCCAGC GCTTCCTGCC CAACCCCGCT GGCGTGCAGC TGGAAGATCC 
1901 CGAGTTCCAG GCCAGCAACA TCATGCACAG CATCAACGGC TACGTGTTCG ' 
1951 ACAGCCTGCA GCTGAGCGTG TGCCTGCATG AGGTGGCCTA CTGGTACATC 
2001 CTGAGCATCG GCGCCCAGAC CGACTTCCTG AG CGTGTTCT TCTCCGGGTA 
2051 TACCTTCAAG CACAAGATGG TGTACGAGGA CACCCTGACC CTGTTCCCCT 
2101 TCTCCGGCGA GACTGTGTTC ATGTCTATGG AGAACCCCGG CCTGTGGATT 
2151 CTGGGCTGCC ACAACAGCGA CTTCCGCAAC CGCGGCATGA CTGCCCTGCT 
2201 GAAAGTCTCC AGCTGCGACA AGAACACCGG CGACTACTAC GAGGACAGCT 
"SI ACGAGGACAT CTCCGCCTAC CTGCTGTCCA AGAACAACGC CATCGAGCCC 
2301 CGCTCCTTCT CCCAAAACTC CCGCCACCCC AG CACGCGTC AGAAGCAGTT 
2351 CAACGCCACC CCCCCCGTGC TGAAGCGCCA CCAGCGCGAG ATCACCCGCA 
2401 CCACCCTGCA AAGCGACCAG GAGGAGATCG AGTACGACGA CACCATCAGC 
2451 GTGGAGATGA AGAAGGAGGA CTTCGACATC TACGACGAGG ACGAGAACCA 
2501 GAGCCCCCGC TCCTTCCAAA AGAAAACCCG CCACTACTTC ATCGCCGCCG 
2551 TGGAGCGCCT CTGGGACTAC CGCATGAGCA GCAGCCCCCA CGTCCTGOGC 
2601 AACCGCGCCC AGAGCGGCAG CGTGCCCCAC TTCAAGAAGG TGGTGTTCCA 
2651 GGAGTTCACC GACGGCAGCT TCACCCAGCC CCTGTACCGC GGCGAGCTGA 
2701 ACGAGCACCT GGGCCTGCTC GGCCCCTACA TCCGCGCCGA CGTGGAGGAC 
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2751 AACATGATGG 
2801 CTCCTCCCTG 
2851 GCAAGAACTT 
2901 CAGCACCACA 
2951 CTACTTCAGC 

3001 i GCCCCCTGCT 
3051 CAGGTG ACTG 

3101 TAAG AGCTGG 

3151 GGAACATCCA 

3201 GCCATCAACG 

3251 GG ACCAGCGC 

3301 TCCACAGCAT 

3351 GAGTACAAGA 

3401 GGAGATGCTG 

3451 GCGAGCACCT 

3501 AAGTGCCAGA 

3551 GATCACCGCC 

3601 TGCACTACAG 

3651 TGGATCAAGG 

3701 CCAGGGCGCC 

3751 TCATGTACTC 

3801 ACCGGCACCC 

3851 GCACAACATC , 

3901 CCACCCACTA 

3951 GACCTGAACA 

4001 CG ACGCCCAG 

4051 GGAGCCCCAG 

4101 CGCCCCCAGG 

4151 AACCATGAAG 

4201 CCAGCATGTA 

4251 CAGTGGACCC 

4301 CCAGGACAGC 

4351 CCCGCTACCT 

4401 CGGATGGAGG 

4451 C 



TGACCTTCCG 
ATCAGCTACG 
CGTGAAGCCC 
TGGCCCCCAC 
GACGTGGACC 
GGTGTGCCAC 
TCCXGGAATT 
TACTTCACCG 
GATGGAAGAT 
GCTACATCAT 
ATCCGCTGGT 
CCACTTCAGC 1 
TGGCCCTGTA 
CCC AG CAAGG 
GCACGCCGGC 
CCCCCCTGGG 
AGCGG CCAGT 
CGGCAGCATC 
TGGACCTGCT 
CGCCAGAAGT 
TCTAGACGGC 
TGATGGTGTT 
TTCAACCCCC 
CAGCATCCGC 
GCTGCAGCAT 
ATCACCGCCT 
CAAGGCCCGC 
TGAACAACCC 
GTGACTGGCG 
CGTGAAGGAG 
TGTTCTTCCA 
TTCACACCGG 
GCGCATCCAC 
TGCTGGGCTG 



CAACCAAGCO 
AGGAGGACCA 
AACGAGACTA 
CAAGGACGAG 
TGGAbAAGGA 
ACCAACACCC 
TGCCCTGTTC 

agaacatgga 
cccaccttca' 
ggacaccctg 

ACCTGCTGTC, 
1 GGCCACGTTT 
CAACCTGTAC 
CCGGGATCTG 
ATGAGCACCC 
CATGGCGAGC 
ACGGCGAGTG 
AACGCCTGGT 
GGCCCCCATG 
TCAGCAGCCT 
AAGAAGTGGC 
CTTCGGCAAC 
CCATCATCGC 
AGCACCCTGC 
GCCCCTGGGC 
CCAGCTACTT 
CTGCACCTGC 
CAAGG AGTGG 
TGACCACCCA 
TTCCTGATCA 
AAACGGCAAG 
TCGTGAACAG 
CCCCAGAGCT 
CG AGG CCC AG 



TCCCGGCCCX 
GCGCCAGGGC 
AGACCTACTT 
TTCGACTGCA 
CGTGCACAGC 
TGAACCCCCC 
. TTCACCATCT 
GCGCAACTGC 
AGGAGAACTA 
CCCGGCCTGG 
TATGGGCAGC 
TCACCGTGCG 
CCCGGCGTGT 
GCGCGTGGAG 
TGTTCCTGGT 
GGCCACATCC 
{ GGCTCCCAAG 
CGACCAAGGA 
ATCATCCACG 
GTACATCAGC 
AGACCTACCG 
GTGGACAGCA 
CCGCTACATC 
GCATGG AG CT 
ATGG AG AG C A 
CACCAACATG 
AGGGCCGCAG 
CTGCAGGTGG 
GGGCGTCAAG 
GCAGCAGCCA 
GTGAAGGTGT 
CCTGGACCCC 
GGGTGCACCA 
GACCTGTACT 



ACTCCTTCTA 
GCCGAGCCCC 
CTGGAAGGTG 
AGGCCTGGGC 
GGCCTGATCG , 
CCACGGGAGG 
TCGACGAGAC 
CGCGCCCCCT 
CCGCTTCCAC 
TGATGGCCCA 
AACGAGAACA 
CAAGAAGGAG 
TCGAGACTGT 
TGCCTGATCG 
GTACAGCAAC 
GCGACTTCCA , 
CTGGCCCGCC 
GCCCTTCTCC 
GCATCAAGAC 
CAGTTCATCA 
CGGCAACAGC 
GCGGCATCAA 
CGCCTCCACC 
GATGGGCTGC 
AGGCCATCAG 
TTCGCCACCT 
CAACGCCTGG 
ACTTCCAGAA 
AGCCTGCTGA 
GGACGGCCAC 
TCCAGGGCAA 
CCCCTGCTGA 
GATCGCCCTG 
GAAGCGGCCG 
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HTGH LEVEll EXPRESSION OF PROTEINS 
Field of the Invention ' 
The invention concerns genes and methods for expressing eukaryotic 
5 and viral proteins at high levels in eukaryotic cells. 

Rackground of the Invention 
Expression of eukaryotic gene products in prokaryotes is sometimes 
limited by the presence of codons that are infrequently used in E. coli. 
Expression of such genes can be enhanced by systematic substitution of the 
10 endogenous codons with codons over represented in highly expressed 

prokaryotic genes (Robinson et al., Nucleic Acids Res. 12:6663, 1984). It is 
commonly supposed that rare codons cause pausing of the ribosome, which 
leads to a failure to complete the nascent polypeptide chain and a uncoupling of 
transcription and translation. Pausing of the ribosome is thought to lead to 
15 exposure of the 3 1 end of the mRNA to cellular ribonucleases. 

Summary o f the Invention 
The invention features a synthetic gene encoding a protein normally 
expressed in a mammalian cell or other eukaryotic cell wherein at least one 
non-preferred or less preferred codon in the natural gene encoding the protein 
2 0 has been replaced by a preferred codon encoding the same amino acid. 

Preferred codons are: Ala (gcc); Arg (cgc); Asn (aac); Asp (gac) Cys 
(tgc); Gin (cag); Gly (ggc); His (cac); He (ate); Leu (ctg); Lys (aag); Pro (ccc); 
Phe (ttc); Ser (age); Thr (acc); Tyr (tac); and Val (gtg). Less preferred codons 
are: Gly (ggg); He (att); Leu (etc); Ser (tec); Val (gtc); and Arg (agg). All 
2 5 codons which do not fit the description of preferred codons or less preferred 
codons are non-preferred codons. In general, the degree of preference of a 
particular codon is indicated by the prevalence of the codon in highly expressed 
human genes as indicated in Table 1 under the heading "High." For example, 
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"ate" represents 77% of the He codons in highly expressed mammalian genes 
and is the preferred He codon; "att" represents 18% of the He codons in highly 
expressed mammalian genes and is the less preferred He cbdon. The sequence 
"ata" represents only 5% of the lie codons in highly expressed human genes as 
5 is a non-preferred'He codon. Replacing a codon with another codon that is 
more prevalent in highly expressed human genes will generally, mcrease 
expression of the gene in mammalian cells. Accordingly, the invention 
includes replacing a less' preferred codon with a preferred codon as well as 
replacing a non-preferred "codon with a preferred or less preferred codon. 
! o By "protein normally expressed in a mammalian cell" is meant a 

protein which is expressed in mammalian under natural conditions. The term 
includes genes in the mammalian genome such as those encoding Factor VIII, 
Factor IX, interleukins, and other proteins! The term also includes genes which 
are expressed in a mammalian cell under disease conditions such as oncogenes 
15 as well as genes which are encoded by a virus (including a retrovirus) which 
are expressed in mammalian cells post-infection. By "protein normally 
expressed in a eukaryotic cell" is meant a protein which is expressed in a 
eukaryote under natural conditions. The term also includes genes which are 
expressed in a mammalian cell under disease conditions. 
20 In preferred embodiments, the synthetic gene is capable of 

expressing the mammalian or eukaryotic protein at a level which is at least 
110%, 150%, 200%, 500%, 1,000%, 5,000% or even 10,000% of that 
expressed by the "natural" (or "native") gene in an in vitro mammalian cell 
culture system under identical conditions (i.e., same cell type, same culture 
2 5 conditions, same expression vector). 

Suitable cell culture systems for measuring expression of the 
synthetic gene and corresponding natural gene are described below. Other 
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suitable expression systems employing mammalian cells are well known to 
those skilled in the art and are described in, for example, the standard 
molecular biology reference works noted below. Vectors suitable for 
expressing the synthetic and natural genes are described below and in the 
5 standard reference works described below. By "expression" is meant protein 
expression. Expression can be measured using an antibody specific for the 
protein of interest Such antibodies and measurement techniques are well 
known to those skilled in the art. By "natural gene" and "native gene" is meant 
the gene sequence (including naturally occurring allelic variants) which 
10 naturallyencodes the protein, i.e., the native or natural coding sequence. 

In other preferred embodiments at least 10%, 20%, 30%, 40%, 50%, 
60%, 70%, 80%, or 90% of the codons in the natural gene are non-preferred 
codons. 

In other preferred embodiments at least 10%, 20%, 30%, 40%, 50%, 
15 60%, 70%, 80%, or 90% of the non-preferred codons in the natural gene are 
replaced with preferred codons or less preferred codons. 

In other preferred embodiments at least 10%, 20%, 30%, 40%, 50%, 
60%, 70%, 80%, or 90% of the non-preferred codons in the natural gene are 
replaced with preferred codons. 
20 In a preferred embodiment the protein is a retroviral protein. In a 

more preferred embodiment the protein is a lentiviral protein. In an even more 
preferred embodiment the protein is an HIV protein. In other preferred 
embodiments the protein is gag, pol, env, gpl20, or gpl60. In other preferred 
embodiments the protein is a human protein. In more preferred embodiments, 
25 the protein is human Factor VIII and the protein in B region deleted human 
Factor VIII. In another preferred embodiment the protein is green flourescent 
protein. 
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1 In various preferred embodiments at least 30%, 40%, 50%, 60%, 

70%, ,80%, 90%, and 95% of the codons in the synthetic gene are preferred or 
less preferred codons. 

The invention also features an expression vector comprising the 

i' 

5 synthetic gene. 

In another aspect the invention features a cell harboring the synthetic 
gene. In various preferred embodiments the cell is a prokaryotic cell and the 
cell is a mammalian cell. 

In preferred embodiments the synthetic gene includes fewer than 50, 

10 fewer than 40, fewer than 30, fewer than 20, fewer than 10, fewer than 5, or no 

i 

"eg" sequences. 

1 The invention also features a method for preparing a synthetic gene 

encoding a protein normally expressed by a mammalian cell or other eukaryotic 
cell. The method includes identifying non-preferred and less-preferred codons 

15 in the natural gene encoding the protein and replacing one or more of the non- 
preferred and less-preferred codons with a preferred codon encoding the same 
amino acid as the replaced codon. 

Under some circumstances (e.g., to permit introduction of a 
restriction site) it may be desirable to replace a non-preferred codon with a less 

2 0 preferred codon rather than a preferred codon. 

It is not necessar y to re place all less preferred or non-p referred 



codons with preferred codons. Increased expression can be accomplished even 
with partial replacement of less preferred or non-preferred codons with 
preferred codons. Under some circumstances it may be desirable to only 
2 5 partially replace non-preferred codons with preferred or less preferred codons 
in order to obtain an intermediate level of expression. 
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In other 1 preferred embodiments the invention features vectors 
(including expression vectors) comprising one or more the synthetic genes. , 

By "vector" is, meant a DNA molecule, derived^ e.g., from a plasmid, 
bacteriophage, or mammalian or insect virus, into which fragments of DNA 
5 may be inserted or cloned. A vector will contain one or more unique restriction 
sites and may be capable of autonomous replication in a defined host or vehicle 
organism such that the cloned sequence is reproducible. Thus, by "expression 
vector" is meant any autonomous element capable of directing the synthesis of 
a protein. Such DNA expression vectors include mammalian plasmids and 
10 viruses. ' - — — „. . _ 

The invention also features synthetic gene fragments which encode a 
desired portion of the protein. Such synthetic gene fragments are similar to the 
synthetic genes of the invention except that they encode only a portion of the 
protein. Such gene fragments preferably encode at least 50, 100, 150, or 500 
15 contiguous amino acids of the protein. 

In constructing the synthetic genes of the invention it may be 
desirable to avoid CpG sequences as these sequences may cause gene silencing. 
Thus, in a preferred embodiment the coding region of the synthetic gene does 
not include the sequence "eg." 
2 o The codon bias present in the HIV gpl 20 env gene is also present in 

the gag and pol genes. Thus, replacement of a portion of the non-preferred and 
less preferred codons found in these genes with preferred codons should 
produce a gene capable of higher level expression. A large fraction of the 
codons in the human genes encoding Factor VIII and Factor IX are non- 
2 5 preferred codons or less preferred codons. Replacement of a portion of these 
codons with preferred codons should yield genes capable of higher level 
expression in mammalian cell culture. 
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i The synthetic genes of the invention can be introduced into the cells 

of a livingorganism. For example, vectors (viral or non- viral) can ,be used to 
introduce a synthetic gene into cells of a living organism for gene therapy. 

Conversely, it may be desirable to replace preferred codons in a 

5 naturally occurring gene with less-pre!ferred codons as a means of lowering 

it 1 

expression. 

Standard reference works describing the general principles of 

recombinant DNA technology include Watson et al., Mofecylar PjjplQgy of the 

Gene . Volumes I and II, the Benjamin/Cummings Publishing Company, Inc., 
1 0 „publisher,.Menlo_P_ark,_CAXl 987); JDamell„ et aL T Molecular Cell Biology . 

Scientific American Books, Inc., Publisher, New York, N.Y. (1986); Old et al., 
. Principles of Gene Manipulation: An Introduction to Genetic Engineering . 2d 

edition, University of California Press, publisher, Berkeley, CA (1981); 

Maniatis et al., Molecular Cloning: A Laboratory ManuaL 2nd Ed. Cold Spring 
15 Harbor Laboratory, publisher, Cold Spring Harbor, NY (1989); and Current 

Protocols in Molecular Biology . Ausubel et al., Wiley Press, New York, NY 

(1992). 

By "transformed cell" is meant a cell into which (or into an ancestor 
of which) has been introduced, by means of recombinant DNA techniques, a 

2 0 selected DNA molecule, e.g., a synthetic gene. 

By "positioned fo r expression" is meant that a DNA molecule , e.g.,_a^_ 
synthetic gene, is positioned adjacent to a DNA sequence which directs 
transcription and translation of the sequence (i.e., facilitates the production of 
the protein encoded by the synthetic gene. 
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) Description of the Drawings 

Figure 1 depicts the sequence of the synthetic gpl20 and a synthetic 

gpl60 gene in which codons have been replaced by those found in highly 

expressed human genes. 

5 Figure 2 is a schematic drawing of the synthetic gpl20 (HIV-1 MN) 

gerie. The shaded portions marked vl to v5 indicate hypervariable regions. The 

filled box indicates the CD4 binding site. A limited number of the unique 

' restriction sites ares shown: H (Hind3), Nh (Nhel), P (Pstl), Na (Nael), M 

(Mlul), R (EcoRl), A (Agel) and No (Notl). The chemically synthesized 

10 DNA fragments which served as PCR templates are shown below the gpl20 

i 

sequence, along with the locations of the primers used for their amplification. 

Figure 3 is a photograph of the results of transient transfection assays 
used to measure gpl20 expression. Gel electrophoresis of immunoprecipitated 
supernatants of 293T cells transfected with plasmids expressing gpl20 encoded 
15 by the IIIB isolate of HIV-1 (gpl20IIIb), by the MN isolate of HIV-1 
(gpl20mn), by the MN isolate of HIV-1 modified by substitution of the 
endogenous leader peptide with that of the CD5 antigen (gpl20mnCD5L), or 
by the chemically synthesized gene encoding the MN variant of HIV-1 with the 
human CDSLeader (syngpl20mn). Supernatants were harvested following a 12 

2 0 hour labeling period 60 hours post-transfection and immunoprecipitated with 

CD4:IgGl fusion protein and protein A sepharose. 

Figure 4 is a graph depicting the results of ELISA assays used to 
measure protein levels in supernatants of transiently transfected 293T cells. 
Supernatants of 293T cells transfected with plasmids expressing gpl20 
25 encoded by the IIIB isolate of HIV-1 (gpl20 Illb), by the MN isolate of HIV-1 
(gpl20mn), by the MN isolate of HIV-1 modified by substitution of the 
endogenous leader peptide with that of CDS antigen (gpl20mn CD5L), or by 
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the chemically synthesized gene encoding the MN variant- of HIV- 1 with 
human CDS leader (syngpl20mn) were harvested after 4<days and tested in, a 
gpl20/CD4 ELISA. The level of gpl20 is expressed in hg/ml. 

Figure 5A is a photograph of a gel illustrating the results of a 
5 immunoprecipitation assay used to measure expression of the native and 
synthetic gpl20 in the presence of rev in trans and the RRE in cis. In this 
experiment 293T cells were transiently transfeded by calcium phosphate co- 
precipitation of 10 fjtg of plasmid expressing: (A) the synthetic gpl20MN 
sequence and RRE in cis> (B) the gpl20 portion of HIV- 1 IIIB, (C) the gpl20 

10 portion of HIV- 1 IIIB and RRE in cis, all in the presence or absence of rev 
expression. The RRE constructs gpl20IIIbRRE and syngpl20mnRRE were 
generated using an Eagl/Hpal RRE fragment cloned by PCR from a HIV-1 
HXB2 proviral clone. Each gpl20 expression plasmid was cotransfected with 
10 /zg of either pCMVrev or CDM7 plasmid DNA. Supernatants were 

15 harvested 60 hours post transfection, immunoprecipitated with CD4:IgG fusion 
protein and protein A agarose, and run on a 7% reducing SDS-PAGE. The gel 
exposure time was extended to allow the induction of gpl20IIIbrre by rev to be 
demonstrated. 

z Figure 5B is a shorter exposure of a similar experiment in which 

2 0 syngp 1 20mnrre was cotransfected with or without pCM Vrev. 

Figure 5C is a schematic diagram of the constructs used in Figure 

5A. 

Figure 6 is a comparison of the sequence of the wild-type ratTHY-1 
gene (wt) and a synthetic ratTHY-1 gene (env) constructed by chemical 
2 5 synthesis and having the most prevalent codons found in the HIV- 1 env gene. 

Figure 7 is a schematic diagram of the synthetic ratTHY-1 gene. The 
solid black box denotes the signal peptide. The shaded box denotes the 
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sequences in the precursor which direct the ahachment of a phophatidyl- 
inositol glycan anchor. Unique restriction sites used for assembly of the i 
THY-1 constructs are marked H (Hind3), M (Mlul), S (Sacl) and No (Notl). 
The position of the synthetic oligonucleotides employed in the construction are 
5 shown at the bottom of the figure. 

Figure 8 is a graph depicting the results of flow cytometry analysis. ; 
In this experiment 293T cells transiently transfected with either a wild-type , 

ratTHY-1 expression plasmid (thick line), ratTHY-1 with envelope codons 

1 ii 

expression plasmid (thin line), or vector only (dotted line) by calcium 
10 phosphate co-precipitation. Cells were stained with anti-ratTHY-1 monoclonal 

antibody OX7 followed by a polyclonal FITC-conjugated anti-mouse IgG 

antibody 3 days after transfection. 

Figure 9A is a photograph of a gel illustrating the results of 

immunoprecipitation analysis of supernatants of human 293T cells transfected 
1 5 with either syngp 1 20mn (A) or a construct syngp 1 20mn.rTHY- 1 env which has 

the rTHY-lenv gene in the 3' untranslated region of the syngp 120mn gene (B). 

The syngp 120mn.rTHY-l env construct was generated by inserting a Notl 

adapter into the blunted Hind3 site of the rTHY-lenv plasmid. Subsequently, 

a 0.5 kb Notl fragment containing the rTHY-lenv gene was cloned into the 
2 0 Notl site of the syngpl 20mn plasmid and tested for correct orientation. 

Supernatants of 35 S labeled cells were harvested 72 hours post transfection, 

precipitated with CD4:IgG fusion protein and protein A agarose, and run on a 

7% reducing SDS-PAGE. 

Figure 9B is a schematic diagram of the constructs used in the 
2 5 experiment depicted in Figure 9A. 

Figure 1 OA is a photograph of COS cells transfected with vector only 

showing no GFP fluorescence. 
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Figure 1 OB is a photograph of COS cells transfected with a CDM7 
expression plasmid encoding native GFP engineered to include a ponsensus 
translational initiation sequence. 

Figure 10C is a photograph of COS cells transfected with an 
expression plasmid having the same flanking sequences and initiation 
consensus as in Figure 10B, but bearing a codon optimized gene sequence. 

Figure 10D is a photograph of COS cells transfected with an' 
expression plasmid as in Figure 10C, but bearing a Thr at residue 65 in place of 
Ser. 

... _ Figure 1 1 depicts the sequence of a synthetic, gene encoding green 

flourescent proteins (SEQ ID NO:40). 

Figure 12 depicts the sequence of a native human Factor VIII gene 
lacking the central B domain (amino acids 760-1639, inclusive) (SEQ ID 
NO:41). 

Figure 13 depicts the sequence of a synthetic human Factor VIII 
gene lacking the central B domain (amino acids 760-1639, inclusive) (SEQ ID 
NO:42). 

Description of the Preferred Embodiments 

EXAMPLE I 

Consfruction of a Synthetic gp!20 Gene Hav ing Codons Found in Highly 

"Expressed Human Genes 

A codon frequency table for the envelope precursor of the LAV 
subtype of HIV- 1 was generated using software developed by the University of 
Wisconsin Genetics Computer Group. The results of that tabulation are 
contrasted in Table 1 with the pattern of codon usage by a collection of highly 
expressed human genes. For any amino acid encoded by degenerate codons, 
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1 the most favored codon of the highly expressed genes is different from the most 

1 1 1 

favored cbdon of the HIV envelope precursor. Moreover a simple .rule 
describes the pattern of favored envelope codons wherever it applies: preferred 
codons maximize the number of 

! ' I 

I 

5 adenine residues in the viral RNA. In all cases but one this means that the 

i i 1 

coclon in which the third position is A is the most frequently used. In the 
special case of serine, three codons equally contribute one A residue to the 
mRNA; together these three comprise 85% of the serine codons actually used 
in envelope transcripts, A particularly striking example of the A bias is found 

10 in the codon choice for-arginine, in which the AGA triplet comprises 88% of 
the arginine codons. In addition to the preponderance of A residues, a marked 
preference is seen for uridine among degenerate codons whose third residue 
must be a pyrimidine. Finally, the inconsistencies among the less frequently 
used variants can be accounted for by the observation that the dinucleotide CpG 

15 is under represented; thus the third position is less likely to be G whenever the 
second position is C, as in the codons for alanine, proline, serine and threonine; 
and the CGX triplets for arginine are hardly used at all. 
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TABLE 1: ' Corion Frequency in the HTV-1 IITb env gene and' in highly 
expressed human genes. ' 
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Codon frequency was calculated using the GCG program established the 
University of Wisconsin Genetics Computer Group. Numbers represent the 
percentage of cases in which the particular codon is used. Codon usage 
frequencies of envelope genes of other HIV-1 virus isolates are comparable and 
show a similar bias. 



In order to produce a gpl20 gene capable of high level expression in 
mammalian cells, a synthetic gene encoding the gp 120 segment of HIV-1 was 

25 constructed (syngpl20mn), based on the sequence of the most common North 
American subtype, HIV-1 MN (Shaw et al., Science 226:1 165, 1984; Gallo et 
al., Nature 321 : 1 19, 1986). In this synthetic gpl20 gene nearly all of the native 
codons have been systematically replaced with codons most frequently used in 
highly expressed human genes (Figure 1). This synthetic gene was assembled 

3 0 from chemically synthesized oligonucleotides of 1 50 to 200 bases in length. If 
oligonucleotides exceeding 120 to 150 bases are chemically synthesized, the 
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percentage of full-rlength product can be low, and the vast excess of material 
consists of shorter oligonucleotides. Since these shorter fragments inhibit , 
cloning and PCR procedures, it can be very difficult to use pligonucleotides 
exceeding a certain length. In order to use crude synthesis material without 
5 prior purification* single-stranded oligonucleotide pools were PCR amplifiejd 
before cloning. PCR products were purified in agarose gels and used as 
templates in the next PCR! step. Two adjacent fragments could be co-amplified 
because of overlapping sequences at the end of either fragment. These 
fragments, which were b.etween 350 and 400 bp in size, were subcloned into a 
10 pCDM7-derived plasmid containing the leader sequence of the CDS surface 

molecule followed by a Nhel/Pstl/Mlul/EcoRl/BamHl polylinker. Each of 

i 

the restriction enzymes in this polylinker represents a site that is present at 
either the 5 1 or 3 f end of ,the PCR-generated fragments. Thus, by sequential 
subcloning of each of the 4 long fragments, the whole gpl20 gene was 

15 assembled. For each fragment three to six different clones were subcloned and 
sequenced prior to assembly. A schematic drawing of the method used to 
construct the synthetic gpl20 is shown in Figure 2. The sequence of the 
synthetic gpl20 gene (and a synthetic gpl60 gene created using the same 
approach) is presented in Figure 1. 

2 0 The mutation rate was considerable. The most commonly found 

mutations were short (1 nucleotide) and long (up to 30 nucleotides) deletions. 
In some cases it was necessary to exchange parts with either synthetic adapters 
or pieces from other subclones without mutation in that particular region. 
Some deviations from strict adherence to optimized codon usage were made to 

2 5 accommodate the introduction of restriction sites into the resulting gene to 
facilitate the replacement of various segments (Figure 2). These unique 
restriction sites were introduced into the gene at approximately 100 bp 
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intervals. The native HIV leader sequence v^a's exchanged with the highly 
efficient leader peptide of the human CD5 antigen to facilitate secretion , 
(Aruffo et al., Cell 61:1303, 1990) The plasmid uSed for construction is a 
derivative of the mammalian expression vector pCDM7 transcribing the 
5 inserted gene under the control of a strong humian CMV immediate early f 

- ■ i i ; ■ 1 , i 'i i 

promoter. 1 J ' . ( , 

To compare the wild-type and synthetic gpl20 coding sequences, the 
synthetic gpl20 coding sequence was inserted into a mammalian expression 
vector and tested in transient transfection assays. Several different native 

10 gpl20 genes were used as controls to exclude variations in-expression levels 
between different virus isolates and artifacts induced by distinct leader , 
sequences. The gpl20 HIV Illb construct used as control was generated by 
PCR using a Sall/Xhol HIV-1 HXB2 envelope fragment as template. To 
exclude PCR induced mutations, a Kpnl/Earl fragment containing 

15 approximately 1.2 kb of the gene was exchanged with the respective sequence 
from the pro viral clone. The wild-type gpl20mn constructs used as controls 
were cloned by PCR from HIV-1 MN infected C8166 cells (AIDS Repository, 
Rockville, MD) and expressed gpl20 either with a native envelope or a CDS 
leader sequence. Since proviral clones were not available in this case, two 

2 0 clones of each construct were tested to avoid PCR artifacts. To determine the 
amount of secreted gpl20 semi-quantitatively supernatants of 293T cells 
transiently transfected by calcium phosphate co-precipitation were 
immunoprecipitated with soluble CD4: immunoglobulin fusion protein and 
protein A sepharose. 

25 The results of this analysis (Figure 3) show that the synthetic gene 

product is expressed at a very high level compared to that of the native gpl20 
controls. The molecular weight of the synthetic gpl20 gene was comparable to 
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• control proteins (Figure 3) and appeared to be in the range of 100 to 1 10 kd. 
The slightly faster migration can be explained by the fact that in some tumor 
cell lines, e.g., 293T, glycosylation is either not complete or altered to some 
extent. 

5 To compare expression moire accurately gpl20 protein levels were 

quantitated using a gpl20 ELISA with CD4 in the demobilized phase. This 
analysis shows (Figure 4) that ELISA data were comparable to the 
immunoprecipitation data, with a gpl20 concentration of approximately 125 
ng/ml for the synthetic gpl20 gene, and less than the background cutoff (5 

10 ng/ml) for all the native gpl20 genes. Thus, expression of the synthetic gpl20 
gene appears to be at least one order of magnitude higher than wild-type gpl20 
genes. In the experiment shown the increase was at least 25 fold. 
T h e Role of rev m gpXZQ Expression 

Since rev appears to exert its effect at several steps in the expression 

15 of a viral transcript, the possible role of non-translational effects in the 

improved expression of the synthetic gpl20 gene was tested. First, to rule out 
the possibility that negative signals elements conferring either increased mRNA 
degradation or nucleic retention were eliminated by changing the nucleotide 
sequence, cytoplasmic mRNA levels were tested. Cytoplasmic RNA was 

2 o prepared by NP40 lysis of transiently transfected 293T cells and subsequent 

elimination of t he nuclei by centrifugation. C yto plasmic RNA was 

subsequently prepared from lysates by multiple phenol extractions and 
precipitation, spotted on nitrocellulose using a slot blot apparatus, and finally 
hybridized with an envelope-specific probe. 

2 5 Briefly, cytoplasmic mRNA 293 cells transfected with CDM&, 

gpl20 HIB, or syngpl20 was isolated 36 hours post transfection. Cytoplasmic 
RNA of Hela cells infected with wild-type vaccinia virus or recombinant virus 
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i expressing gpl20 Illb or the synthetic gpl20 gene was under the control of the 
7.5 promoter was isolated 16 hours post infection. Equal amounts were spotted 
on nitrocellulose using a slot blot device and hybridized with randomly labeled 
1 .5 kb gp 120111b and syngpl20 fragments or human beta-actin. RNA 
5 expression levels were quantitated by scanning the hybridized membranes with 
a phospoimager. The procedures used are described in greater detail below. 

This experiment demonstrated that there was no significant 
difference in the mRNA levels of cells transfected with either the native or 
synthetic gpl20 gene. In fact, in some experiments cytoplasmic mRNA level 
10 of the synthetic gpl20 gene was even lower than that of the native gpl20 gene. 
These data were confirmed by measuring expression from 
' recombinant vaccinia viruses. Human 293 cells or Hela cells were infected 
with vaccinia virus expressing wild-type gpl20 Illb or syngpl20mn at a 
multiplicity of infection of at least 10. Supernatants were harvested 24 hours 
15 post infection and immunoprecipitated with CD4:immunoglobin fusion protein 
and protein A sepharose. The procedures used in this experiment are described 
in greater detail below. 

This experiment showed that the increased expression of the 
synthetic gene was still observed when the endogenous gene product and the 

2 0 synthetic gene product were expressed from vaccinia virus recombinants under 

the control of the strong mixed early and late 7.5k promoter. Because vaccinia 
virus mRNAs are transcribed and translated in the cytoplasm, increased 
expression of the synthetic envelope gene in this experiment cannot be 
attributed to improved export from the nucleus. This experiment was repeated 
2 5 in two additional human cell types, the kidney cancer cell line 293 and HeLa 
cells. As with transfected 293T cells, mRNA levels were similar in 293 cells 
infected with either recombinant vaccinia virus. 
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Codon Usage ifa Lentivirus 

Because it appears that codon usage has a significant impact on , 

expression in mammalian cells, the codon frequency in the envelope genes of 

' i . ' i 

other retroviruses was examined. This study found no clear pattern of codon 

5 preference between retroviruses in general. Hpwever, if viruses from the 

lentivirus genus; to which HIV-1 belongs to, were analyzed separately, codon , 

usage bias almost identical to that of HIV-1 was found. A codon frequency ( 

table from the envelope glycoproteins of a variety of (predominantly type C) 

retroviruses excluding th,e,lenti viruses was prepared, and compared a codon 

1 0 . .frequency table created from the, envelope sequences of four lenti viruses not 

closely related to HIV-1 (caprine arthritis encephalitis virus, equine infectious 

' ' . i ' 

anemia virus, feline immunodeficiency virus, and visna virus) (Table 2). The 

codon usage pattern for lentiviruses is strikingly similar to that of HIV-1, in all 

cases but one, the preferred codon for HIV-1 is the same as the preferred codon 

15 for the other lentiviruses. The exception is proline, which is encoded by CCT 
in 41% of non-HIV lentiviral envelope residues, and by CCA in 40% of 
residues, a situation which clearly also reflects a significant preference for the 
triplet ending in A. The pattern of codon usage by the non-lentiviral envelope 
proteins does not show a similar predominance of A residues, and is also not as 

2 0 skewed toward third position C and G residues as is the codon usage for the 

highly expressed hu man genes. In general non-lentiviral retroviruses ap pearto 
exploit the different codons more equally, a pattern they share with less highly 
expressed human genes. 
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TABLE 2: Codon frequency in th e envelope gene of le ntiviruses (lentil 
and non-lentiyiral retroviruses (other) > 



Other Lenti Other Lenti 

t . 1 





Ala 






t ( 


Cys 




< 
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GC 


c 


45 


13 


TG 


C 


53 


21 






T 


26 


37 




T 


' 47 


79 






i- 

A 


20 


46 


' , f 




i 


1 






G 


9 


3 


Gin 




t 














CA, 


A 


52 


69 


10 


Arg 










G 


48 


31 




CG 


C 


, 14 


2 














T 


6 


>< 


Glu 












A 


,16 


1 5 


GA 


a' 


57 


68 




— - - 


— G 


17 ~ 


3 


. 


G 


'43 


32 


15 


AG 


A 


31V 


51 














G 


15 


26 




















GG 


C 


21 


8 




Asn 






i 




T 


13 


9 




AA 


C 


49 


31 




A 


37 


56 


20 




T 


51 


69 




G 


29 


26 




Asp 








His 










GA 


C 


55 


33 


CA 


C 


51 


38 






T 


51 


69 




T 


49 


62 












lie 








25 










AT 


C 


38 


16 














T 


31 


22 














A 


31 


61 




Leu 








Ser 










CT 


C 


22 


8 


TC 


C 


38 


10 


30 




T 


14 


9 




T 


17 


16 






A 


21 


16 




A 


18 


24 






G 


19 


11 




G 


6 


5 




TT 


A 


15 


41 


AG 


C 


13 


20 






G 


10 


16 




T 


7 


25 
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1 


Lys 








Tftr 


'. 








AA 


A 


60 


63 


AC 


C 


44 


18 






G 


40 


37 




T 


27 


20 














'a 


1 o 


• ^ 


5 


Pro 










G 


10 


8 




CC 


C 


42 


14 














T 


30 


41 


Tyr 












A 


20 


40 


' ' TA 


C 


48 


28 






G 


7 


5 1 




,T 


, 52 


72 


10 


Phe 








Val 










TT 


C 


52 


25 


GT 


C 


36 


9 






T 


48 


75 




T 


17 


10 














A 


22 


54 














G 


25 


27 

1 


15 



















Codon frequency was calculated using the GCG program established by the 
University of Wisconsin Genetics Computer Group. Numbers represent the 
percentage in which a particular codon is used. Codon usage of non-lentiviral 
retroviruses was compiled from the envelope precursor sequences of bovine 

2 0 leukemia virus feline leukemia virus, human T-cell leukemia virus type I, 

human T-cell lymphotropic virus type II, the mink cell focus-forming isolate of 
murine leukemia virus (MuLV), the Rauscher spleen focus^forming isolate, the 
10A1 isolate, the 4070 A amphotropic isolate and the myeloproliferative 
leukemia virus isolate, and from rat leukemia virus, simian sarcoma virus, 

25 simian T-cell leukemia virus, leukemogenic retrovirus T1223/B and gibbon ape 
leukemia virus. The codon frequency tables for the non-HIV, non-SIV 
lentiviruses were compiled from the envelope precursor sequences for caprine 
arthritis encephalitis virus, equine infectious anemia virus, feline 
immunodeficiency virus, and visna virus. 



In addition to the prevalence of codons containing an A, lentiviral 
codons adhere to the HIV pattern of strong CpG under representation, so that 
the third position for alanine, proline, serine and threonine triplets is rarely G. 
The retroviral envelope triplets show a similar, but less pronounced, under 
3 5 representation of CpG. The most obvious difference between lentiviruses and 
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other retroviruses with respect to CpG prevalbiice lies in the usage of the CGX 
variant of arginine triplets, which is reasonably frequently represented among 
the retroviral envelope coding sequences, but is altnost never present among the 
comparable lentivirus sequences. 
5 Differences in rev Dep endence Between Native and Synthetic gp!20 , 

I,* ; ■ 1 , * i i 

To examine whether regulation by rev is connected to HIV-1 codon , 
usage, the influence of rev on the expression of tpth native and synthetic gene 
was investigated. Since regulation by rev requires the rev-binding site RRE in 
cis, constructs were made in which this binding site was cloned into the 3' 
1 0 untranslated region of both the native and the synthetic gene. These plasmids 

were co-transfected with rev or a control plasmid in trans into 293T cells, and 

' ' ... 
gpl20 expression levels in supernatants were measured semiquantitatively by 

immunoprecipitation. The procedures used in this experiment are described in 

greater detail below. 

15 As shown in Figure 5A and Figure 5B, rev up regulates the native 

gpl20 gene, but has no effect on the expression of the synthetic gpl20 gene. 

Thus, the action of rev is not apparent on a substrate which lacks the coding 

sequence of endogenous viral envelope sequences. 

Expression of a synthetic ratTHY-1 gene with HIV envelope 
20 codons 

The above-described experiment suggest that in fact "envelope 
sequences" have to be present for rev regulation. In order to test this 
hypothesis, a synthetic version of the gene encoding the small, typically highly 
expressed cell surface protein, ratTHY-1 antigen, was prepared. The synthetic 
25 version of the ratTHY-1 gene was designed to have a codon usage like that of 
HIV gpl20. In designing this synthetic gene AUUUA sequences, which are 
associated with mRNA instability, were avoided. In addition, two restriction 
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t sites were introduced to simplify manipulation of the resulting gene (Figure 6). 
This synthetic gene with the HIV envelope codon usage (rTHY-leiiv) was 
generated using three 150 to 170 mer oligonucleotides (Figure 7). In contrast 
to the syngp!20mn gene, PCR products were directly cloned and assembled in 
5 pUC12, and subsequently cloned intopCDM7. 

. Expression levels of native rTHY- 1 and rTHY- 1 with the HIV 
envelope codons were quantitated by immunofluorescence of transiently 
transfected 293T cells. Figure 8 shows that the expression of the native THY-1 
gene is almost two orders of magnitude above the background level of the 
10 .controLtransfected cells_(pCDM7).„In contrast,^xpression,of the synthetic 

ratTHY-1 is substantially lower than that of the native gene (shown by the shift 
i to of the peak towards a lower channel number). 

To prove that no negative sequence elements promoting mRNA 
degradation were inadvertently introduced, a construct was generated in which 
15 the rTHY-lenv gene was cloned at the 3 f end of the synthetic gpl20 gene 
(Figure 9B). In this experiment 293T cells were transfected with either the 
syngpl20mn gene or the syngp!20/ratTHY-l env fusion gene 
(syngpl20mn.rTHY-lenv). Expression was measured by immunoprecipitation 
with CD4:IgG fusion protein and protein A agarose. The procedures used in 

2 0 this experiment are described in greater detail below. 

Since the synthetic gp!20 gene has an UAG stop codon, rTHY-lenv 



is not translated from this transcript. If negative elements conferring enhanced 
degradation were present in the sequence, gpl20 protein levels expressed from 
this construct should be decreased in comparison to the syngpl20mn construct 
2 5 without rTHY-lenv. Figure 9 A, shows that the expression of both constructs is 
similar, indicating that the low expression must be linked to translation. 
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' Rev-dependent expression of synthetic ratTHY-1 gene with envelope 

codons : 

■ i i 

' ' ' ' , ■ ' i ■ 

To explore whether rev is able to regulate expression of a ratTHY-1 

gene having env codons, a construct was made with a rev-binding site in the 3 1 

5 end of the rTHYlenv open reading frame. To measure rev-responsiveness of 

the a ratTHY-1 env construct having a 3 1 RRE, human 293T cells were 

cotransfected ratTHY-1 envrre and either CDM7 or pCMVrev. At 60 hours 

post transfection cells were detached with 1 mM EDTA in PBS and stained 

with the OX-7 anti rTHY-1 mouse monoclonal antibody and a secondary 

1 0 FITC-conjugated antibody. Fluorescence intensity was measured using a 

i 

EPICS XL cytofluorometer. These procedures are described in greater detail 
below. 

In repeated experiments, a slight increase of rTH Y- 1 env expression 
was detected if rev was cotransfected with the rTHY-1 env gene. To further 
15 increase the sensitivity of the assay system a construct expressing a secreted 
version of rTHY-1 env was generated. This construct should produce more 
reliable data because the accumulated amount of secreted protein in the 

i 

supernatant reflects the result of protein production over, an extended period, in 
contrast to surface expressed protein, which appears to more closely reflect the 

2 0 current production rate. A gene capable of expressing a secreted form was 
prepared by PCR using forward and reverse primers annealing 3 1 of the 
endogenous leader sequence and 5* of the sequence motif required for 
phosphatidylinositol glycan anchorage respectively. The PCR product was 
cloned into a plasmid which already contained a CDS leader sequence, thus 

2 5 generating a construct in which the membrane anchor has been deleted and the 
leader sequence exchanged by a heterologous (and probably more efficient) 
leader peptide. 

- 23 - 



BNSDOCID: < WO 98 1 2207 A 1 J A> 



' 1 

WO 98/12207 , , ;« PCT/US97/16639 

i 

The rev-responsiveness of the secreted form ratTHY-lenv was 
measured by immunoprecipitation of supernatarits of human 293T cells , 
cotransfected with a plasmid expressing a secreted form of ratTHY-lenv and 
the RRE sequence in' cis (rTHY-lenvPI-rre) and either CDM7 orpCMVrev. 
5 The rTHY-1 en vPI-RRE construct was made by PCR using the oligonucleotide: 
cgcggggctagcgcaaagagtaataagtttaac (SEQ ID NO:38) as a forward primer, the , 
oligonucleotide: cgcggatcccttgtattttgtactaata (SEQ ID NO:39) as reverse 
primer, and the synthetic rTHY-lenv construct ,as a template. After digestion 
with Nhel and Notl the PGR fragment was cloned into a plasmid containing 

10 CDS leader and RRE sequences. Supernatarits of 35 S labeled cells were 

harvested 72 hours post transfection, precipitated with a mouse monoclonal 
antibody OX7 against rTHY-1 and anti mouse IgG sepharose, and run on a 
12% reducing SDS-PAGE. ' 

In this experiment the induction of rTHY-lenv by rev was much 

15 more prominent and clear-cut than in the above-described experiment and 

strongly suggests that rev is able to translationally regulate transcripts that are 
suppressed by low-usage codons. 

Rev-independent expression of a rTHY-1 env:immunoglobulin 
fiigion protein 

20 To test whether low-usage codons must be present throughout the 

whole coding sequence or whether a short region is sufficient to confer rev- 

responsiveness, a rTHY-lenv -.immunoglobulin fusion protein was generated. 

In this construct the rTHY-lenv gene (without the sequence motif responsible 
for phosphatidylinositol glycan anchorage) is linked to the human IgGl hinge, 

2 5 CH2 and CH3 domains. This construct was generated by anchor PCR using 
primers with Nhel and BamHI restriction sites and rTHY-lenv as template. 
The PCR fragment was cloned into a plasmid containing the leader sequence of 
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the CD5 surface molecule and the hinge, CH2 and CH3 parts of human IgGl 
immunoglobulin. A Hind3/Eagl fragment containing the rTHY-leny eg 1 insert 
was subsequently cloned into a pCDM7-derived plasmid with the RRE 

sequence* 1 i 
5 Tp meisure the response of the rTHY- 1 env/ immunoglobin fusion 

gene (rTHY-leiiveglrre) to rev human 293T cells cbtransfected with 
rTHY-lenveglrre and either pCDM7 or pCMVrev. The rTHY-lenveglrre t 

construct was made by anchor PCR using forward! and reverse primers with 

}( 

Nhel and BamHl restriction sites respectively. The PCR fragment was cloned 

1 o ~into-a-plasmid-containing-a-GD5-leader-and-human IgG 1 hinge,CH2 and CH3 

domains. Supernatahts of 35 S labeled cells were harvested 72 hours post 
transfection, precipitated with a mouse monoclonal antibody OX7 against 
rTHY-1 and anti mouse. IgG sepharose, and run on a 12% reducing SDS- 
PAGE. The procedures used are described in greater detail below. 
15 As with the product of the rTHY- 1 envPI- gene, this 

rTHY-lenv/immunoglobulin fusion protein is secreted into the supernatant. 
Thus, this gene should be responsive to rev-induction. However, in contrast to 
rTHY-lenvPI-, cotransfection of rev in trans induced no or only a negligible 
increase of rTHY-1 envegl expression. 

2 o The expression of rTHY-1 immunoglobulin fusion protein with 

native rTHY-1 or HIV envelope codons was measured by immunoprecipitation. 
Briefly, human 293T cells transfected with either rTHY-1 envegl (env codons) 
or rTHY-1 wtegl (native codons). The rTHY-lwtegl construct was generated 
in manner similar to that used for the rTHY-1 envegl construct, with the 
25 exception that a plasmid containing the native rTHY-1 gene was used as 
template. Supernatants of 35 S labeled cells were harvested 72 hours post 
transfection, precipitated with a mouse monoclonal antibody OX7 against 
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i rTHY-1 and anti mouse IgG sepharose, and run on a 12% reducing SDS- 

PAGE. THE procedures used in this experiment are described in,greater detail 
below. 

Expression levels of rTHY-1 envegl were decreased in comparison to 
5 a similar construct with wild-type rTHY-1 as the fusion partner, but were still 
considerably higher than rTHY-1 env. Accordingly, both parts of the fusion 
protein influenced expression levels. The addition of rTHY-1 env did hot 
restrict expression to an equal level as seen for rTHY-1 env alone. Thus, 
regulation by rev appears to be ineffective if protein expression is not almost 

10 completely suppressed. _ 

Cpdon pref erence j n HIV-1 envelope g^g 

Direct comparison between codon usage frequency of HIV envelope 
and highly expressed human genes reveals a striking difference for all twenty 
amino acids. One simple measure of the statistical significance of this codon 

15 preference is the finding that among the nine amino acids with two fold codon 
degeneracy, the favored third residue is A or U in all nine. The probability that 
all nine of two equiprobable choices will be the same is approximately 0.004, 
and hence by any conventional measure the third residue choice cannot be 
considered random. Further evidence of a skewed codon preference is found 

20 among the more degenerate codons, where a strong selection for triplets 

bearing adenine can be seen. This contrasts w ith the pattern for highly 

expressed genes, which favor codons bearing C, or less commonly G, in the 
third position of codons with three or more fold degeneracy. 

The systematic exchange of native codons with codons of highly 

25 expressed human genes dramatically increased expression of gp!20. A 

quantitative analysis by ELISA showed that expression of the synthetic gene 
was at least 25 fold higher in comparison to native gpl20 after transient 
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' transfection into human 293 cells. The concentration levels in the ELISA 
experiment shown were rather ,low. Since an ELISA was used for 
quantification which is based on gpl20 binding to CD4, only native, non- 
denatured material was detected. This may explain the apparent low 

i 1 

5 expression. Measurement of cytoplasmic, mRNA levels demonstrated that the 

difference in protein expression is due to translational differences and not 
mRNA stability. 

Retroviruses in general do not show a similar preference towards A 
and T as found for HIV. But if this family was divided into two subgroups, 

10 lentiviruses and non-lentiviral-retroviruses, a similar preference to A and, less 

i 

frequently, T, was detected at the third codon position for lentiviruses. Thus, 

' the availing evidence suggests that lentiviruses retain a characteristic pattern of 

i 

envelope codons not because of an inherent advantage to the reverse 
transcription or replication of such residues, but rather for some reason peculiar 
15 to the physiology of that class of viruses. The major difference between 

lentiviruses and non-complex retroviruses are additional regulatory and non- 
essentially accessory genes in lentiviruses, as already mentioned. Thus, one 
simple explanation for the restriction of envelope expression might be that an 
important regulatory mechanism of one of these additional molecules is based 
2 0 on it. In fact, it is known that one of these proteins, rev, which most likely has 
homologues in all lentiviruses. Thus codon usage in viral mRNA is used to 
create a class of transcripts which is susceptible to the stimulatory action of rev. 
This hypothesis was proved using a similar strategy as above, but this time 
codon usage was changed into the inverse direction. Codon usage of a highly 
2 5 expressed cellular gene was substituted with the most frequently used codons in 
the HIV envelope. As assumed, expression levels were considerably lower in 
comparison to the native molecule, almost two orders of magnitude when 
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analyzed by immunofluorescence of the surface expressed molecule. If rev was 

coexpressed in trans and a RRE element was present in cisonly a slight 

induction was found for the surface molecule. However, if THY- 1 was 

expressed as a secreted molecule, the induction by rev was much more 

5 prominent, supporting the above hypothesis. This can probably be explained 

by accumulation of secreted protein in the supernatant, 1 which considerably ) 

amplifies the rev effect. If rev only induces a minor increase for surface 

molecules in general, induction of HIV envelope by rev cannot have the 

purpose of an increased surface abundance, but rather of an increased 

1 0 intracellular gp 1 60 level. It is completely uticleaf at the moment why this 
should be the case. • , , 

To test whether small subtotal elements of a gene are sufficient to 
restrict expression and render it rev-dependent rTHYlenv: immunoglobulin 
fusion proteins were generated, in which only about one third of the total gene 

15 had the envelope codon usage. Expression levels of this construct were on an 
intermediate level, indicating that the rTHY-lenv negative sequence element is 
not dominant over the immunoglobulin part. This fusion protein was not or 
only slightly rev-responsive, indicating that only genes almost completely 
suppressed can be rev-responsive. 

2 0 Another characteristic feature that was found in the codon frequency 

tables is a striking under representation of CpG triplets. In a com parative stud y 
of codon usage in E. coli, yeast, drosophila and primates it was shown that in a 
high number of analyzed primate genes the 8 least used codons contain all 
codons with the CpG dinucleotide sequence. Avoidance of codons containing 

25 this dinucleotide motif was also found in the sequence of other retroviruses. It 
seems plausible that the reason for under representation of CpG-bearing triplets 
has something to do with avoidance of gene silencing by methylation of CpG 
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1 cytokines. The expected number of CpG dinucleotides for HIV as a whole is 

about one' fifth that expected on the basis of the baise composition.! This might 

indicate that the possibility of high expression is restored, and that the gene in 

fact has to be highly expressed at some point during viral pathogenesis. 

i ' 
5 ( The results presented herein clearly indicate that codon preference 

has a severe effect on protein levels, and suggest that translational elongation is 

controlling mammalian gene expression. However, other factors may play a 

role. First, abundance of not maximally loaded mRNA's in eukaryotic cells 

indicates that initiation is rate limiting for translation in at least some cases, 

10 since otherwise all transcripts would be completely covered by ribosomes. 

Furthermore, if ribosome stalling and subsequent mRNA degradation were the 

mechanism, suppression by rare codons could most likely not be reversed by 

i. 

any regulatory mechanism like the one presented herein. One possible 
explanation for the influence of both initiation and elongation on translational 

15 activity is that the rate of initiation, or access to ribosomes, is controlled in part 
by cues distributed throughout the RNA, such that the lentiviral codons 
predispose the RNA to accumulate in a pool of poorly initiated RNAs. 
However, this limitation need not be kinetic; for example, the choice of codons 
could influence the probability that a given translation product, once initiated, 

2 0 is properly completed. Under this mechanism, abundance of less favored 

codons would incur a significant cumulative probability of failure to complete 
the nascent polypeptide chain. The sequestered RNA would then be lent an 
improved rate of initiation by the action of rev. Since adenine residues are 
abundant in rev-responsive transcripts, it could be that RNA adenine 

2 5 methylation mediates this translational suppression. 
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Detailed Procedures 1 1 

The following procedures were used in the above-described , 

experiments. • 1 , 

Sequence Analysis 

5 Sequence analyses employed the software developed by the 

University of Wisconsin (Computer Group. > , , 

Plasmid constructions 1 

Plasmid constructions employed the following methods. Vectors and 
insert DNA was digested at a concentration of 0.5 //g/10 /A in the appropriate 

10 restriction buffer for 1 - 4 h ou rs (total reactidn volume approximately 30 fA). 
Digested vector was treated with 10% (v/v) of 1 jug/ml calf intestine alkaline 
phosphatase for 30 min prior to gel electrophoresis. Both vector and insert 
digests (5 to 10 fA each) were run on a 1.5% low melting agarose gel with TAE 
buffer. Gel slices containing bands of interest were transferred into a 1 .5 ml 

15 reaction tube, melted at 65 °C and directly added to the ligation without 

removal of the agarose. Ligations were typically done in a total volume of 25 
Ail in lx Low Buffer lx Ligation Additions with 200-400 U of ligase, 1 fA of 
vector, and 4 fA of insert. When necessary, 5 f overhanging ends were filled by 
adding 1/10 volume of 250 /uM dNTPs and 2-5 U of Klenow polymerase to 

2 0 heat inactivated or phenol extracted digests and incubating for approximately 
20 min at room temperature. When necessary, 3' overhanging ends were filled 



by adding 1/10 volume of 2.5 mM dNTPs and 5-10 U of T4 DNA polymerase 
to heat inactivated or phenol extracted digests, followed by incubation at 37 °C 
for 30 min. The following buffers were used in these reactions: lOx Low 
25 buffer (60 mM Tris HC1, pH 7.5, 60 mM MgCl 2 , 50 mM NaCl, 4 mg/ml BSA, 
70 mM P-mercaptoethanol, 0.02% NaN 3 ); lOx Medium buffer (60 mM Tris 
HC1, pH 7.5, 60 mM MgCl 2 , 50 mM NaCl, 4 mg/ml BSA, 70 mM p- 
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mercaptoethanbl, 0.02% NaN 3 ); lOx High buffer (60 mM Tris HC1, pH 7.5, 60 

mM MgCl ? , 50 mM NaCl, 4 mg/ml BSA,70 mM p-mercaptoethanol, 0.02% 

NaN 3 ); lOx Ligation additions (1 mM ATP, 20 mM DTT, 1 mg/ml BSA, 10 

mM spermidine); 50x TAE (2 M Tris acetate, 50 mM EDTA). 

5 Oligonucleotide synthesis' and purification ' , 

Oligonucleotides were produced on a Milligen 8750 synthesizer 

(Millipore). The columns were eluted with 1 ml of 30% ammonium hydroxide, 

and the eluted oligonucleotides were deblocked at'55°C for 6 to 12 hours. 

After deblockiong, 150 juhof oligonucleotide were precipitated with lOx 

10 volume of unsaturated n-butanol in 1 .5 ml reaction tubes, followed by 

centrifugation at 15,000 rpm in a microfuge. The pellet was washed with 70% 

ethanol and resuspended in 50 /A of H 2 0. The concentration was determined by 

measuring the optical density at 260 nm in a dilution of 1:333 (1 OD 2 6o = 30 

//g/ml). 

15 The following oligonucleotides were used for construction of the 

synthetic gpl20 gene (all sequences shown in this text are in 5' to 3' direction), 
oligo 1 forward (Nhel): cgc ggg eta gec acc gag aag ctg (SEQ ID 

NO:l). 

oligo 1 : acc gag aag ctg tgg gtg acc gtg tac tac ggc gtg ccc gtg tgg 
2 0 aag ag ag gec acc acc acc ctg ttc tgc gec age gac gec aag gcg tac gac acc gag 
gtg cac aac gtg tgg gec acc cag gcg tgc gtg ccc acc gac ccc aac ccc cag gag gtg 
gag etc gtg aac gtg acc gag aac ttc aac at (SEQ ID NO:2). 

oligo 1 reverse: cca cca tgt tgt tct tec aca tgt tga agt tct c (SEQ ID 

NO:3). 

25 oligo 2 forward: gac cga gaa ctt caa cat gtg gaa gaa caa cat (SEQ ID 

NO:4) 
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t oligo 2: tgg aag aac aac atg gtg gag cag atg cat gag gac ate ate age 

ctg tgg gac cag age ctg aag ccc tgc gtg aag ctg acc cc ctg tgcgtg acp tg aac tgc 

■' ' ' , < 

acc gac ctg agg aac acc acc aac acc aac ac age acc gec aac aac aac age aac age 

gag ggc acc ate aag ggc ggc gag atg (SEQ ID NO:5). 

5 oligo 2 reverse (Pstl): gtt gaa get gca gtt ctt cat etc gec gee ctt (SEQ 

EDNO:6). 

oligo 3 forward (Pstl): gaa gaa ctg cag ctt caa cat cac eac cag c (SEQ 

IDNO:7). 

oligo 3: aac ate acc acc age ate cgc gac aag atg cag aag gag tac gec 
1 0 ctg ctg tac aag ctg gat ate gtg age ate gac aac gac age acc age tac cgc ctg ate tec 
tgc aac acc age gtg ate acc cag gee tgc ccc aag ate age ttc gag ccc ate ccc ate 
i cac tac tgc gee ccc gee ggc ttc gee (SEQ ID NO: 8). 

oligo 3 reverse: gaa ctt ctt gtc ggc ggc gaa gee ggc ggg (SEQ ID 

NO:9). 

15 oligo 4 forward: gcg ccc ccg ccg get tog cca tec tga agt gca acg aca 

aga agt tc (SEQ ID NO: 10) 

oligo 4: gec gac aag aag ttc age ggc aag ggc age tgc aag aac gtg age 
acc gtg cag tgc acc cac ggc ate egg ccg gtg gtg age acc cag etc ctg ctg aac 
ggc age ctg gee gag gag gag gtg gtg ate cgc age gag aac ttc acc gac aac gee aag 
2 0 acc ate ate gtg cac ctg aat gag age gtg cag ate (SEQ ID NO: 11) 

oligo 4 reve rse (Mlul): agt tgg gac gc gjgcagttg a tct gca cgo_tct_c 

(SEQ ID NO: 12). 

oligo 5 forward (Mlul): gag age gtg cag ate aac tgc acg cgt ccc 
(SEQ ID NO: 13). 

25 oligo 5: aac tgc acg cgt ccc aac tac aac aag cgc aag cgc ate cac ate 

ggc ccc ggg cgc gec ttc tac acc acc aag aac ate ate ggc acc ate etc cag gee cac 
tgc aac ate tct aga (SEQ ID NO: 14) . 
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1 oligo 5 reverse: gtc gtt cca ctt ggc tct aga, gat gtt gca (SEQ ID 

NO:15). 

oligo 6 forward: gca aca tct eta gag cca agt gga acg ac (SEQ ID 

NO:16). 

5 oligo 6: gec aag tgg aac gac acc ctg cgc cag ate gtg age aag ctg aag 

gag cag ttc aag aac aag acc ate gtg ttc ac cag age age ggc ggc gac ccc gag ate 
gtg atg cac age ttc aac tgc ggc ggc (SEQ ID NO: 17). 

oligo 6 reverse (EcoRl): gca gta gaa gaa ttc gec gec gca gtt ga (SEQ 
ID NO: 18). 

10 oligo 7- fonvard-(EcoR 1 ) :- tea act -gcg- gcg gcg aat tct tct act gc (SEQ 

ID NO: 19). 

oligo 7: ggc gaa ttc ttc tac tgc aac acc age ccc ctg ttc aac age acc tgg 
aac ggc aac aac acc tgg aac aac acc acc ggc age aac aac aat att acc etc cag tgc 
aag ate aag cag ate ate aac atg tgg cag gag gtg ggc aag gee atg tac gee ccc ccc 
15 ate gag ggc cag ate egg tgc age age (SEQ ID NO:20) 

oligo 7 reverse: gca gac egg tga tgt tgc tgc tgc acc gga tct ggc cct c 
(SEQ ID NO:21). 

oligo 8 forward: cga ggg cca gat ccg gtg cag cag caa cat cac egg tct 
g (SEQ ID NO:22). 

2 o oligo 8: aac ate acc ggt ctg ctg ctg acc cgc gac ggc ggc aag gac acc 

gac acc aac gac acc gaa ate ttc cgc ccc ggc ggc ggc gac atg cgc gac aac tgg aga 
tct gag ctg tac aag tac aag gtg gtg acg ate gag ccc ctg ggc gtg gee ccc acc aag 
gee aag cgc cgc gtg gtg cag cgc gag aag cgc (SEQ ID NO:23). 

oligo 8 reverse (Notl): cgc ggg egg ccg ctt tag cgc ttc teg cgc tgc 
25 acc ac (SEQ ID NO: 24). 

The following oligonucleotides were used for the construction of the 
ratTHY-lenv gene. 
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oligo 1 forward (BamHl/Hind3): cgc ggg gga tec aag ctt acc atg att 
cca gta ata agt (SEQ ID NO:25). ' 

t i , > * 

oligo 1 : atg aat cca gta ata agt ata aca tta tta tta agt gta tta caa atg 
agt aga gga caa aga gta ata agt tta aca gca tct tta gta aat caa aat ttg aga tta gat tgt 
5 aga cat gaa aat aat aca aat ttg cca ata' caa cat gaa ttt tea tta acg (SEQ ID NO:26). 

oligo 1 reverse (EcoRl/Mlul): cgc ggg gaa ttc acg cgt taa tga aaa ttc 
atg ttg (SEQ ID NO:27). 1 ' ( 

oligo 2 forward (BamHl/Mlul): cgc gga tec acg cgt gaa aaa aaa aaa 
cat (SEQ ID NO:28). „\ 
10 oligo 2: cgt gaa aaa aaa aaa cat gta tta agt gga aca tta gga gta cca gaa 

cat aca tat aga agt aga gta aat ttg ttt agt gat aga ttc ata aaa gta tta aca tta gca aat 
ttt aca aca aaa gat gaa gga gat tat atg tgt gag (SEQ ID NO:29). 

oligo 2 reverse (EcoRl/Sacl): cgc gaa ttc gag etc aca cat ata ate tec 
(SEQ ID NO:30). 

15 oligo 3 forward (BamHl/Sacl): cgc gga tec gag etc aga gta agt gga 

caa(SEQIDNO:31). 

oligo 3: etc aga gta agt gga caa aat cca aca agt agt aat aaa aca ata aat 
gta ata aga gat aaa tta gta aaa tgt ga gga ata agt tta tta gta caa aat aca agt tgg tta 
tta tta tta tta tta agt tta agt ttt tta caa gca aca gat ttt ata agt tta tga (SEQ ID 

20 NO:32). 

oligo 3 reverse (EcoRl/Notl): cgc_gaatt<^gcg gep g et tea taa act tat 
aaa ate (SEQ ID NO:33). 

Polymerase Chain Reaction 

Short, overlapping 15 to 25 mer oligonucleotides annealing at both 
2 5 ends were used to amplify the long oligonuclotides by polymerase chain 
reaction (PCR). Typical PCR conditions were: 35 cycles, 55 °C annealing 
temperature, 0.2 sec extension time. PCR products were gel purified, phenol 
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extracted, and lised in a subsequent PCR to generate longer fragmeints 

consisting ot two adjiacent small fragments. These longer fragments were , 

cloned into a CDM7-derived plasihid containing a leader sequence of the CD5 

■ . ' 'i ; 

surface molecule followed by a Nhel/Pstl/Mlul/EcoRl/BamHl polylinker. 
5 The following, solutions were used in these reactions: lOx PCR , 

buffer (500 mM KCl, 100 mM Tris HCl, pH 7.5, 8 mM MgCl^ 2 mM each 
dNTP). The final buffer was complemented with, 10% DMSO to increase 
fidelity of the Taq polymerase. i 1 

Small scale DNA preparation 

l o Transformed bacteria were grown in 3 ml LBcultures for more than 

6 hours or overnight! Approximately 1.5 ml of each culture was poured, into 
1.5 ml microfiige tubes, spun for 20 seconds to pellet cells and resuspended in 
200 /A of solution I. Subsequently 400 /A of solution II and 300 {A of solution 
III were added. The microfiige tubes were capped, mixed and spun for > 30 sec. 

15 Supernatants were transferred into fresh tubes and phenol extracted once. DNA 
was precipitated by filling the tubes with isopropanol, mixing, and spinning in a 
microfiige for > 2 min. The pellets were rinsed in 70 % ethanol and 
resuspended in 50 fA dH20 containing 10 /A of RNAse A. The following 
media and solutions were used in these procedures: LB medium (1.0 % NaCl, 

20 0.5% yeast extract, 1.0% trypton); solution I (10 mM EDTA pH 8.0); solution 
II (0.2 M NaOH, 1.0% SDS); solution III (2.5 M KOAc, 2.5 M glacial aceatic 
acid); phenol (pH adjusted to 6.0, overlaid with TE); TE (10 mM Tris HCl, pH 
7.5, 1 mM EDTA pH 8.0). 

Large scale DNA preparation 

25 One liter cultures of transformed bacteria were grown 24 to 36 hours 

(MC1061p3 transformed with pCDM derivatives) or 12 to 16 hours (MCI 061 
transformed with pUC derivatives) at 37 °C in either M9 bacterial medium 
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i (pCDM derivatives) or LB (pUC derivatives). Bacteria were spun down in 1 
liter bottles using a Beckman J6 centrifuge at 4,200 rpm for 20 min. The pellet 
was resuspended in 40 ml of solution L Subsequently, 80 ml of solution II and 

40 ml of solution III were added and the bottles were shaken semivigorously 

! i • 

5 until lumps of 2 to 3 mm size developed. The bottle was spun at 4,200 rpm for 

i i 1 

5 min and the supernatant was poured through cheesecloth into a 250 ml bottle. 

Isopropanol was added to the top and the bottle was spun at 4,200 
rpm for 10 min. The pellet was resuspended in 4.1 ml of solution I and added 
to 4.5 g of cesium chloride, 0.3 ml of 10 mg/ml ethidium bromide, and 0.1 ml 

10 of 1% Triton XI 00 solution, The tubes were spun in a Beckman J2 high speed 
centrifuge at 10,000 rpm for 5 min. The supernatant was transferred into 
Beckman Quick Seal ultracentrifuge tubes, which were then sealed and spun in 
a Beckman ultracentrifuge using a NVT90 fixed angle rotor at 80,000 rpm for > 
2.5 hours. The band was extracted by visible light using a 1 ml syringe and 20 

15 gauge needle. An equal volume of dH 2 0 was added to the extracted material. 
DNA was extracted once with n-butanol saturated with 1 M sodium chloride, 
followed by addition of an equal volume of 1 0 M ammonium acetate/ 1 mM 
EDTA. The material was poured into a 13 ml snap tube which was tehn filled 
to the top with absolute ethanol, mixed, and spun in a Beckman J2 centrifuge at 

2 o 10,000 rpm for 10 min. The pellet was rinsed with 70% ethanol and 

resuspended in 0.5 to 1 ml o f H 2 Q. The DNA concentration was determine d by_ 
measuring the optical density at 260 nm in a dilution of 1 :200 (1 OD 260 = 50 
Mg/ml). 

The following media and buffers were used in these procedures: M9 
25 bacterial medium (10 g M9 salts, 10 g casamino acids (hydrolyzed), 10 ml M9 
additions, 7.5 jug/ml tetracycline (500 {A of a 15 mg/ml stock solution), 12.5 
jug/ml ampicillin (125 /A of a 10 mg/ml stock solution); M9 additions (10 mM 
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CaCl 2 , 1 00 mM MgS0 4 , 2Q0 Aig/ml thiamine; 70% glycerol); LB medium (1.0 

%NaCl, 0.5 % yeast extract, 1.0 % trypton);' Solution I (10 mM EDTApH , 

8.0); Solution II (0,2 M IvTaOH 1 .0 % SDS); Solution III (2.5 ,M KOAc 2.5 M 

HOAc) , 

5 S^q^ncmg 1 . • , ( 

1,1 , ■ 1 , i i t 

Synthetic genes were sequenced by the Sanger dideoxynucleotide 

method. In brief, 20 to 50 >ug double-stranded pl^smid DNA were denatured, in i 

0.5 M NaOH for 5 min. Subsequently the DNA was precipitated with 1/10 

volume of sodium acetate (pH 5.2) and 2 volumes of ethanol and centrifuged 

10 for 5 min. The pellet-was-washed with 70% ethanol and resuspended at a 

concentration of 1 jug//zl. The annealing reaction was carried out with 4,/^g of 
template DNA and 40 ng of primer in lx annealing buffer in a final volume of 
10 /A. The reaction was heated to 65 °C and slowly cooled to 37°C. 

In a separate tube 1 /A of 0.1 M DTT, 2 /A of labeling mix, 0.75 /A of 

15 dH 2 0, 1 fA of [ 35 S] dATP (10 |xCi), and 0.25 fA of Sequenase™ (12 U/^l) were 
added for each reaction. Five /A of this mix were added to each annealed 
primer-template tube and incubated for 5 min at room temperature. For each 
labeling reaction 2.5 fA of each of the 4 termination mixes were added on a 
Terasaki plate and prewarmed at 37°C. At the end of the incubation period 3.5 

20 iA of labeling reaction were added to each of the 4 termination mixes. After 5 
min, 4 fA of stop solution were added to each reaction and the Terasaki plate 
was incubated at 80°C for 10 min in an oven. The sequencing reactions were 
run on 5% denaturing polyacrylamide gel. An acrylamide solution was 
prepared by adding 200 ml of lOx TBE buffer and 957 ml of dH 2 0 to 100 g of 

2 5 acrylamide:bisacrylamide (29: 1). 5% polyacrylamide 46% urea and lx TBE 
gel was prepared by combining 38 ml of acrylamide solution and 28 g urea. 
Polymerization was initiated by the addition of 400 /A of 10% ammonium 
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i peroxodisulfate and 60 yul of TEMED. Gels were poured using silanized glass 

■ i 

plates and sharktooth combs and run in lx TBE buffer at 60 to 100,W for 2 to 4 
hours (depending on the region to be read). Gels were transferred to Whatman 
blotting paper, dried at 80 °C for about 1 hour, and exposed to x-ray film at 
5 room temperature. Typically exposure time was 12 hours. The following 
solutions were used in these procedures: 5x Annealing buffer (200 mM Tris 
HC1, pH 7.5, 100 mM MgCl 2 , 250 mM NaCl); Labelling Mix (7.5 yM each 
' dCTP, dGTP, and dTTP); Termination Mixes (80 fuM each dNTP, 50 mM 
NaCl, 8 jjM ddNTP (one each)); Stop solution (95% formamide, 20 mM 
1 0 EDTA, 0.05% bromphenol blue, 0.05 % xylencyanon; 5x T BE ( 0.9 M Tris 

borate, 20 mM EDTA); Polyacrylamide solution (96.7 g polyacrylamide, 3.3 g 
i bisacrylamide, 200 ml lx TBE, 957 ml dH 2 0). 
RNA isolation 

Cytoplasmic RNA was isolated from calcium phosphate transfected 
15 293T cells 36 hours post transfection and from vaccinia infected Hela cells 16 
hours post infection essentially as described by Gilman. (Gilman Preparation 
of cytoplasmic RNA from tissue culture cells. In Current Protocols in 
Molecular Biology , Ausubel et al., eds., Wiley & Sons, New York, 1992). 
Briefly, cells were lysed in 400 /A lysis buffer, nuclei were spun out, and SDS 

2 0 and proteinase K were added to 0.2% and 0.2 mg/ml respectively. The 

cytoplasmic extracts were incubated at 37 °C for 20 min , phenol/chlorofo rm 

extracted twice, and precipitated. The RNA was dissolved in 100 fA buffer I 
and incubated at 37°C for 20 min. The reaction was stopped by adding 25 /A 
stop buffer and precipitated again. 
25 The following solutions were used in this procedure: Lysis Buffer 

(TRUSTEE containing with 50 mM Tris pH 8.0, 100 mM NaCl, 5 mM MgCl 2 , 
0.5% NP40); Buffer I (TRUSTEE buffer with 10 mM MgCl 2 , 1 mM DTT, 0.5 
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1 U//ul placental RNAse inhibitor, 0. 1 U/jul RNAs'e free DNAse I); Stop buffer 

i | 

(50mMEDTA1.5MNaOAcl.0%SDS). • ' 
Slot blot analysis 

For slot blot analysis 10 jug of cytoplasmic RNA was dissolved in 50 
5 /A dH 2 Q to which 1 50 ix\ of lOx SSC/1 8% formaldehyde were added. The 
solubilized RNA was then incubated at 65 °C for 1 5 min, and spotted onto with 
a slot blot apparatus. Radioactively labeled probes of 1.5 kb gpl20IIIb and 
syngpl20mn fragments were used for hybridization. Each of the two fragments 
was random labeled in a 50 fA reaction with 10 fA of 5x oligb-labeling buffer, 8 

10 /A of 2.5 mg/ml BSA, 4 fA of [« 32 P]-dCTP (20 uCi/fA; 6000 Ci/mmol), and 5 U 

i 

of Klenow fragment. After 1 to 3 hours incubation at 37 °C 100 fA of 
1 TRUSTEE were added and unincorporated [« 32 P]-dCTP was eliminated using 
G50 spin column. Activity was measured in a Beckman beta-counter, and 
equal specific activities were used for hybridization. Membranes were pre- 
15 hybridized for 2 hours and hybridized for 12 to 24 hours at 42 °C with 0.5 x 10 6 
cpm probe per ml hybridization fluid. The membrane was washed twice (5 
min) with washing buffer I at room temperature, for one hour in washing buffer 
II at 65 °C, and then exposed to x-ray film. Similar results were obtained using 
a 1.1 kb Notl/Sfil fragment of pCDM7 containing the 3 untranslated region. 

2 0 Control hybridizations were done in parallel with a random-labeled human 

beta-actin probe. RNA expression was quantitated by scanning the hybridized 
nitrocellulose membranes with a Magnetic Dynamics phosphorimager. 

The following solutions were used in this procedure: 
5x Oligo-labeling buffer (250 mM Tris HC1, pH 8.0, 25 mM MgCl 2 , 5 mM p- 
25 mercaptoethanol, 2 mM dATP, 2 mM dGTP, mM dTTP, 1 M Hepes pH 6.6, 1 
mg/ml hexanucleotides [dNTP]6); Hybridization Solution (.05 M sodium 
phosphate, 250 mM NaCl, 7% SDS, 1 mM EDTA, 5% dextrane sulfate, 50% 
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formamide, lOO'/ig/ml denatured salmon sperm DNA); Washing buffer I (2x 
SSC, ' 1 , • , , 

I t t 

0.1% SDS); Washing buffer II (0.5x SSC, 0. 1 % SDS); 20x SSC (3 M NaCl, 
0.3 M Na 3 citrate„pH adjusted to 7.0). 

5 Vaccinia recombination 1 

1 i ' • 1 . ■ i 

Vaccinia recombination used a modification of the of the method 

described by Romeo and Seed (Romeo and Seed, 1 Cell , 64: 1037, 1991). ( 

Briefly, CV1 cells at 70 to 90% confluency were infected with 1 to 3 /A of a 

wild-type vaccinia stock A^R (2 x 10 8 pfu/ml) for 1 hour in culture medium 

1 0 without calf serum. After 24 hours, the cells were transfected by calcium 

phosphate with 25 yug TKG plasmid DNA per dish. After an additional 24 to 
48 hours the cells were scraped off the plate, spun down, and resuspended in a 
volume of 1 ml. After 3 freeze/thaw cycles trypsin was added to 0.05 mg/ml 
and lysates were incubated for 20 min. A dilution series of 10, 1 and 0.1 /A of 

15 this lysate was used to infect small dishes (6 cm) of CV1 cells, that had been 
pretreated with 12.5 /^g/ml mycophenolic acid, 0.25 mg/ml xanthin and 1.36 
mg/ml hypoxanthine for 6 hours. Infected cells were cultured for 2 to 3 days, 
and subsequently stained with the monoclonal antibody NEA9301 against 
gpl20 and an alkaline phosphatase conjugated secondary antibody. Cells were 

2 0 incubated with 0.33 mg/ml NBT and 0. 1 6 mg/ml BCIP in AP-buffer and finally 

overlaid with 1% agarose in PBS. Positive plaques were picked and 

resuspended in 100 fA Tris pH 9.0. The plaque purification was repeated once. 
To produce high titer stocks the infection was slowly scaled up. Finally, one 
large plate of Hela cells was infected with half of the virus of the previous 

2 5 round. Infected cells were detached in 3 ml of PBS, lysed with a Dounce 
homogenizer and cleared from larger debris by centrifugation. VPE-8 
recombinant vaccinia stocks were kindly provided by the AIDS repository, 
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Rockville, MD>, and express HIV-l IIIB gpl26 under the 7:5 mixed early/late 
promoter (Earl et al.; J. Virol ., 65:31. 1991).' In all experiments with , 
recombinant vaccina cells were infected at a multiplicity of infection of at least 
10. 

I 1 I I I I, 

5 The following solution was used in this procedure: , 

AP buffer (100 mM Tris HC1, pH 9.5, 100 mM NaCl, 5 mM MgCl 2 ) 1 
Cell cyltwrp 

The monkey kidney carcinoma cell lines CV1 and Cos7, the human 
kidney carcinoma cell line 293T, and the human cervix carcinoma cell line 
1 0 -Hela~were-obtained from the. American TissueXyping^Collection and were 
maintained in supplemented IMDM. They were kept on 10 cm tissue culture 
plates and typically split 1 :5 to 1 :20 every 3 to 4 days. The following 
medium was used in this procedure: 

Supplemented IMDM (90% Iscove's modified Dulbecco Medium, 10% calf 
15 serum, iron-complemented, heat inactivated 30 min 56 °C, 0.3 mg/ml L- 

glutamine, 25 yug/ml gentamycin 0.5 mM p-mercaptoethanol (pH adjusted with 
5 M NaOH, 0.5 ml)). 

Transfection 

Calcium phosphate transfection of 293T cells was performed by 
2 0 slowly adding and under vortexing 10 yug plasmid DNA in 250 fA 0.25 M 
CaCl 2 to the same volume of 2x HEBS buffer while vortexing. After 
incubation for 10 to 30 min at room temperature the DNA precipitate was 
added to a small dish of 50 to 70% confluent cells. In cotransfection 
experiments with rev, cells were transfected with 10 \ig gp 120111b, 
25 gpl20IIIbrre, syngpl20mnrre or rTHY-lenveglrre and 10 )xg of pCMVrev or 
CDM7 plasmid DNA. 
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i The following solutions were used in this procedure: 2x HEBS buffer 

(280 mM NaCl, 10 mM KC1, 1.5 mM sterile filtered); 0.25 mM CaCl 2 
(autoclaved). 

I m t rnwopre pipMpn 
5 After 48 to 60 hours medium was exchanged and cells were 

incubated for additional 12 hours in Cys/Met-free medium containing 200 \id 
of 35 S-translabel. Supernatants were harvested and spun for 15 min at 3000 
rpm to remove debris. After addition of protease inhibitors leupeptin, aprotinin 
and PMSF to 2.5 ng/ml, 50 |xg/ml, 100 |ig/ml respectively, 1 ml of supernatant 

10 was incubated with either 10 jil of packed protein A sepharose alone (rTHY- 
lenveglrre) or with protein A sepharose and 3 jig of a purified 
i CD4/immunoglobulin fusion protein (kindly provided by Behring) (all gpl20 
constructs) at 4°C for 12 hours on a rotator. Subsequently the protein A beads 
were washed 5 times for 5 to 15 min each time. After the final wash 10 fil of 

15 loading buffer containing was added, samples were boiled for 3 min and 
applied on 7% (all gpl20 constructs) or 10% (rTHY-lenveglrre) SDS 
polyacrylamide gels (TRIS pH 8.8 buffer in the resolving, TRIS pH 6.8 buffer 
in the stacking gel, TRIS-glycin running buffer, Maniatis et al., supra 1989). 
Gels were fixed in 10% acetic acid and 10 % methanol, incubated with Amplify 

2 0 for 20 min, dried and exposed for 12 hours. 

The following buffers and solutions were used in this procedure: 
Wash buffer (100 mM Tris, pH 7.5, 150 mM NaCl, 5 mM CaCl 2 , 1% NP-40); 
5x Running Buffer (125 mM Tris, 1.25 M Glycin, 0.5% SDS); Loading buffer 
(10 % glycerol, 4% SDS, 4% p-mercaptoethanol, 0.02 % bromphenol blue). 
25 Immunofluorescence 

293T cells were transfected by calcium phosphate coprecipitation 
and analyzed for surface THY-1 expression after 3 days. After detachment 
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1 with 1 mM EDT A/PBS, cells were stained with the monoclonal antibody OX-7 
in a dilution of 1 :250 at 4<>C for 20 min, washed with PBS and subsequently 
incubated with a 1 :500 dilution of a FITC-conjugated goat anti-mouse 
immunoglobulin antiserum. Cells were washed again, resuspended in 0.5 ml of 
5 a fixing solution, and analyzed on a EPICS XL cytofluorometer (Coulter). 
The following solutions were used in this procedure: 
PBS (137 mM NaCl, 2.7 mM KC1, 4.3 mM Na 2 HP0 4 , 1.4 mM KH 2 P0 4 , pH 
adjusted to 7.4); Fixing solution (2% formaldehyde in PBS). 
ELI3A 

10 — The concentration of gp 1 20 in culture supematants wasdetermined 

using CD4-coated ELISA plates and goat anti-gpl20 antisera in the soluble 
phase. Supernatantsof293T cells transfected by calcium phosphate were 
harvested after 4 days, spun at 3000 rpm for 10 min to remove debris and 
incubated for 12 hours at 4<>C on the plates. After 6 washes with PBS 100 ^il of 

15 goat anti-gpl20 antisera diluted 1:200 were added for 2 hours. The plates were 
washed again and incubated for 2 hours with a peroxidase-conjugated rabbit 
anti-goat IgG antiserum 1 : 1000. Subsequently the plates were washed and 
incubated for 30 min with 100 j-tl of substrate solution containing 2 mg/ml o- 
phenylenediamine in sodium citrate buffer. The reaction was finally stopped 

2 0 with 100 ^tl of 4 M sulfuric acid. Plates were read at 490 nm with a Coulter 

microplate reader. Purified recombinant gp 120111b was used as a control. The 
following buffers and solutions were used in this procedure: Wash buffer (0.1% 
NP40 in PBS); Substrate solution (2 mg/ml o-phenylenediamine in sodium 
citrate buffer). 
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EXAMPLE 2 / . ' 

A Synthetic Green Fluorescent Protein Gene ' 

The efficacy of codon replacement for gp 120 suggests that replacing 

non-preferred cpdons with less preferred codons or preferred codons (and 

5 replacing less preferred cpdons with preferred cocions) will increase expression 

i.i ' ,1 -I | 

in mammalian cells of other proteins, e.g., other eukaryotic proteins. , 

' *' 

The green fluorescent protein (GFP) of the jellyfish Aequorea , 
victoria (Ward, Phpt oc he m, Phptobjoj, 4: 1, 1979; Prasher et al., Gene 1 1 1 :229, 
1992; Cody et ah, BiocheA. 32:1212, 1993) has attracted attention recently for 

1 0 its possible utility as a marker or reporter for transfection and lineage studies 
(Chalfie et al., Seifinss 263:802, 1994). 

Examination of a codon usage table constructed from the native 
coding sequence of GFP showed that the GFP codons favored either A or U in 
the third position. The bias in this case favors A less than does the bias of 

15 gpl20, but is substantial. A synthetic gene was created in which the natural 

GFP sequence was re-engineered in much the same manner as for gpl20 (FIG. 
11; SEQ ID NO:40). In addition, the translation initiation sequence of GFP 
was replaced with sequences corresponding to the translational initiation 
consensus. The expression of the resulting protein was contrasted with that of 

2 0 the wild type sequence, similarly engineered to bear an optimized translational 

initiation consensus (FIG. 10B and FIG. IP C). In addition, the effect of 

inclusion of the mutation Ser 65-Thr, reported to improve excitation efficiency 
of GFP at 490 nm and hence preferred for fluorescence microscopy (Heim et 
al., Nature 373:663, 1995), was examined (FIG. 10D). Codon engineering 

2 5 conferred a significant increase in expression efficiency (an concomitant 

percentage of cells apparently positive for transfection), and the combination of 
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the Ser 65-Thr mutation and codon, optimization resulted in a DNA segment 

' I . r , ' l' 

encoding a highly visible mammalian marker protein (FIG; 10D). i 

The above-described synthetic green fluorescent protein coding 

sequence was assembled in a similar manner as for gpl20 from six fragments 

5 of approximately 120 bp each, using a strategy for assembly that relied on the 

ability of the restriction enzymes Bsal and Bbsl to cleave outside of their ( 

recognition sequence. Long oligonucleotides wejre synthesized which , 

contained portions of the coding sequence for GFP embedded in flanking 

sequences encoding EcoRI and Bsal at one end, and BamHI and Bbsl at the 

i t 
10 other end. Thus, each oligonucleotide has-the. configuration EcoRI/Bsal/GFP 

fragment/BbsI/BamHI. The restriction site ends generated by the Bsal and 

Bbsl sites were designed to yield compatible ends that could be used to join 

adjacent GFP fragments. Each of the compatible ends were designed to be 

unique and non-selfcomplementary. The crude synthetic DNA segments were 

15 amplified by PCR, inserted between EcoRI and BamHI in pUC9, and 

sequenced. Subsequently the intact coding sequence was assembled in a six 
fragment ligation, using insert fragments prepared with Bsal and Bbsl. Two of 
six plasmids resulting from the ligation bore an insert of correct size, and one 
contained the desired full length sequence. Mutation of Ser65 to Thr was 

2 0 accomplished by standard PCR based mutagenesis, using a primer that 
overlapped a unique BssSI site in the synthetic GFP. 

Codon optimization as a strategy for improved expression in mammalian cells 

The data presented here suggest that coding sequence re-engineering 
may have general utility for the improvement of expression of mammalian and 
25 non-mammalian eukaryotic genes in mammalian cells. The results obtained 

here with three unrelated proteins: HIV gpl20, the rat cell surface antigen Thy- 
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f 1 and green fluorescent protein from Aequorea victoria, and human Factor VIII 
(see below) suggest that codon optimization may prove to be a fruitful strategy 
for improving the expression in mammalian cells of a wide variety of 
eukaryotic genes. 

5 EXAMPLE m 

Design of a Codon-Optimized Gene Expressing Human Factor VIII Lacking 
the Central B Domain 

A synthetic gene was designed that encodes mature human Factor 
VIII lacking amino acid residues 760 to 1639, inclusive (residues 779 to 1658, 

1 0 inclusive, of the ,precursor);_The_synthetia gene_was„created by. choosing . _ 

i 

codons corresponding to those favored by highly expressed human genes. 
Some deviation from strict adherence to the favored residue pattern was made 
to allow unique restriction enzyme cleavage sites to be introduced throughout 
the gene to facilitate future manipulations. For preparation of the synthetic 
15 gene the sequence was then divided into 28 segments of 1 50 basepairs, and a 
29th segment of 161 basepairs. 

The a synthetic gene expressing human Factor VIII lacking the 
central B domain was constructed as follows. Twenty-nine pairs of template 
oligonucleotides (see below) were synthesized. The 5 ! template oligos were 

2 0 105 bases long and the 3 f oligos were 104 bases long (except for the last 3 1 

oligo, which was 125 residues long). The template oli gos were desi gned so 

that each annealing pair composed of one 5 ! oligo and one 3' oligo, created a 19 
basepair double-stranded regions. 

To facilitate the PCR and subsequent manipulations, the 5 ! ends of 
2 5 the oligo pairs were designed to be invariant over the first 1 8 residues, allowing 
a common pair of PCR primers to be used for amplification, and allowing the 
same PCR conditions to be used for all pairs. The first 18 residues of each 5 1 
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1 member of the template pair were cgc gaa ttc gga aga ccc (SEQ ID NO: 110) 
and the first 1 8 residues of each 3* member of the template pair were: ggg gat 
cct cac gtc tea (SEQ ID NO:43). 

Pairs of oligos were annealed and then extended and amplified by 
5 PGR in a reaction mixture as follows: templates were annealed at 200 n-g/ml 
each in PCR buffer (10 mM Tris-HCl, 1 .5 mM MgCl 2 , 50 mM KC1, 100 ng/ml 
gelatin, pH 8.3). The PCR reactions contained 2 ng of the annealed template 
oligos, 0.5 p.g of each of the two 18-mer primers (described below), 200 jxM of 
each of the deoxynucleoside triphosphates, 10% by volume of DMSO and PCR 

10 buffer as supplied by Boehringer Mannheim Biochemicals, in a final volume of 
50 pi. After the addition of Taq polymerase (2.5 units, 0.5 jxl; Boehringer 
Mannheim Biochemicals) amplifications were conducted on a Perkin-Elmer 
Thermal Cycler for 25 cycles (94°C for 30 sec, 55°C for 30 sec, and 72°C for 
30 sec). The final cycle was followed by a 10 minute extension at 72°C. 

15 The amplified fragments were digested with EcoRI and BamHI 

(cleaving at the 5* and 3 1 ends of the fragments respectively) and ligated to a 
pUC9 derivative cut with EcoRI and BamHL 

Individual clones were sequenced and a collection of plasmids 
corresponding to the entire desired sequence was identified. The clones were 

2 0 then assembled by multifragment ligation taking advantage of restriction sites 
at the 3 f ends of the PCR primers, immediately adjacent to the amplified 
sequence. The 5* PCR primer contained a Bbsl site, and the 3* PCR primer 
contained a BsmBI site, positioned so that cleavage by the respective enzymes 
preceded the first nucleotide of the amplified portion and left a 4 base 5' 

2 5 overhang created by the first 4 bases of the amplified portion. Simultaneous 

digestion with Bbsl and BsmBI thus liberated the amplified portion with unique 
4 base 5 ? overhangs at each end which contained none of the primer sequences. 
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In general these' overhangs were not self-complementary, allowing 
multifragment ligation reactions to produce the desired product with high , 
efficiency. The unique portion of the first 28 amplified oligonucleotide pairs 
was thereby 154 .basepairs, and after digestion each gave rise to a 150 bp 
5 fragment with unique ends. The first and last fragments were not manipulated 
in this manner, however, since they had other restriction sites designed into 
them to facilitate insertion of the assembled sequence into an appropriate 
mammalian expression vector. The actual assembly process preceded as 
follows. ' ; \ 

10 Assembly of the Synthetic Factor VIII Gene 

Step 1: 29 Fragments Assembled to Form 10 Fragments . 
The 29 pairs of oligonucleotides* which formed segments 1 to 29 
when base-paired, are described below. 1 

Plasmids carrying segments 1, 5, 9, 12, 16, 20, 24 and 27 were 

15 digested with EcoRl and BsmBI and the 170 bp fragments were isolated; 
plasmids bearing segments 2, 3, 6, 7, 10, 13, 17, 18, 21, 25, and 28 were 
digested with Bbsl and BsmBI and the 170 bp fragments were isolated; and 
plasmids bearing segments 4, 8, 1 1, 14, 19, 22, 26 and 29 were digested with 
EcoRI and Bbsl and the 2440 bp vector fragment was isolated. Fragments 

2 0 bearing segments 1, 2, 3 and 4 were then ligated to generate segment "A"; 

fragments bearing segments 5, 6, 7 and 8 were ligated to generate segment "B"; 
fragments bearing segments 9, 10 and 1 1 were ligated to generate segment "C"; 
fragments bearing segments 12, 13, and 14 were ligated to generate segment 
"D"; fragments bearing segments 16, 17, 18 and 19 were ligated to generate 

2 5 segment "F H ; fragments bearing segments 20, 21 and 22 were ligated to 

generate segment "G"; fragments bearing segments 24, 25 and 26 were ligated 
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to generate segment "I"; and fragments bearing segments 27, 28 and 29 were 

1 1 i ' '' 

ligated to generate segment "J": , ' ' • , 

Step 2: Assembly of the 10 resulti ng Fragments from Step 

1 to Three Fragments. , 

5 Plasmids carrying the segments, "A", "D" and "G M were digested with 

i , 1 ! • 1 , » i i 

EcoRI and BsmBI, plasmids carrying the segments B, 1 5, 23, and I were 

digested with Bbsl and BsmBI, and plasmids carrying the segments C, F, and J 

were digested with EcoRI and Bbsl. Fragments bearing segments A; B, and C 

were ligated to generate segment "K"; fragments bearing segments D, 15, and F 

i 1 *■ 1 

1 0 jwere ligated to-generate-segment-^O"; andfragmenteiearing-segments-G, 23, 1, 
and J were ligated to generate segment "P". 

Step 3: Assembly of the Fin al Three Pieces. 
The plasmid bearing segment K was digested with EcoRI and 
BsmBI, the plasmid bearing segment O was digested with Bbsl and BsmBI, 
15 and the plasid bearing segment P was digested with EcoRI and Bbsl. The three 
resulting fragments were ligated to generate segments. 

Step 4: Insertion of the Synthetic Gene in a Mammalian 

Expression Vector- 

The plasmid bearing segment S was digested with Nhel and NotI and 
2 0 inserted between Nhel and EagI sites of plasmid CDSlNEgl to generate 
plasmid cd51sf8b-. 

Sequencing and C orrection of the Synthetic Factor VIII Gene 

After assembly of the synthetic gene it was discovered that there 
were two undesired residues encoded in the sequence. One was an Arg residue 
25 at 749, which is present in the GenBank sequence entry originating from 

Genentech but is not in the sequence reported by Genentech in the literature. 
The other was an Ala residue at 146, which should have been Pro. This 
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i mutation arose at an unidentified step subsequent to. the sequencing of the 29 

constituent fragments. The Pro749Arg mutation was corrected by , 
incorporating the desired change in a PCR primer (ctg ctt ctg acg cgt get ggg 
gtg gcg gga gtt; SEQ ID NO:44) that included the Mlul site at position 2335 of 
5 the sequence below (sequence of Hindlll to NotI segment) and amplifying 
between that primer and a primer (ctg ctg aaa gtc tec age tgc; SEQ ID NO:44) 
5* to the SgrAI site at 2225. The SgrAI to Mlul fragment was then inserted into 
the expression vector at the cognate sites in the vector, and the resulting correct 
sequence change verified by sequencing. The Pro 146 Ala mutation was 

1 0 . jcorrected by incorporating Jheji^iredsequence change in an oligonucleotide 
(ggc agg tgc tta agg aga acg gec eta tgg cca; SEQ ID NO:46) bearing the Aflll 
i site at residue 504, and amplifying the fragment resulting from PCR reaction 
between that oligo and the primer having sequence cgt tgt tct tea tac gcg tct ggg 
get cct egg ggc (SEQ ID NO: 109), cutting the resulting PCR fragment with 

15 Aflll and Avrll at (residue 989), inserting the corrected fragment into the 

expression vector and confirming the construction by sequencing. 

Construction of a Matched Native Gene Expressing Human Factor VIII 
Lacking ths Central p Porxrein 

A matched Factor VIII B domain deletion expression plasmid having 

2 0 the native codon sequence was constructed by introducing Nhel at the 5 ! end of 

the mature coding sequence using primer cgc caa ggg eta gec gec acc aga ag a 
tac tac ctg ggt (SEQ ID NO:47), amplifying between that primer and the primer 
att cgt agt tgg ggt tec tct gga cag (corresponding to residues 1067 to 1093 of the 
sequence shown below), cutting with Nhel and Aflll (residue 345 in the 
2 5 sequence shown below) and inserting the resulting fragment into an 

appropriately cleaved plasmid bearing native Factor VIII. The B domain 
deletion was created by overlap PCR using ctg tat ttg atg aga acc g, 
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1 (corresponding to residues 1813 to 1831 below) and caa gac tgg tgg ggt ggc att 

aaa ttg ctt t (SEQ ID NO:48) (2342 to 2372 on complement below) for the 5' 
end of the overlap, and aat gcc acc cca cca gtc ttg aaa cgc ca (SEQ ID NO:49) 

(2352 to 2380 on sequence below) and cat ctg gat att gca ggg ag (SEQ ID 

i 

5 NO:50) (3 145 to 3 164). The products of the two individual PCR reactions were 

then mixed and reamplified by use of the outermost primers, the resulting 
fragment cleaved by Asp718 (Kpnl isoschizomer, 1837 on sequence below) 
and PflMI (3 1Q0 on sequence below), and inserted into the appropriately 
cleaved expression plasmid bearing native Factor VIIL 

10 Ihe-complete sequence-(SEQ -ID NO:4 1~) of the native-human factor 

VIII gene deleted for the central B region is presented in Figure 12. The 

complete sequence (SEQ ID NO:42) of the synthetic Factor VIII gene deleted 

i 

for the central B region is presented in Figure 13. 
Preparation and assay of expressio n plasmids 
15 Two independent plasmid isolates of the native, and four 

independent isolates of the synthetic Factor VIII expression plasmid were 
separately propagated in bacteria and their DNA prepared by CsCl buoyant 
density centrifugation followed by phenol extraction. Analysis of the 
supernatants of COS cells transfected with the plasmids showed that the 

2 0 synthetic gene gave rise to approximately four times as much Factor VIII as did 

the native gene. 

COS cells were then transfected with 5 \ig of each factor VIII 
construct per 6 cm dish using the DEAE-dextran method. At 72 hours post- 
transfection, 4 ml of fresh medium containing 10% calf serum was added to 
2 5 each plated. A sample of media was taken from each plate 12 hr later. 

Samples were tested by ELISA using mouse anti-human factor VIII light chain 
monoclonal antibody and peroxidase-conjugated goat anti-human factor VIII 
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polyclonal antibody. Purified human plasma 1 factor VlII was used as a 
standard, £ells transfected with the synthetic Factor VI 11 gene construct , 
expressed 138 ± 20.2 ng/fml (equivalent ng/ml non-deleted Factor VIII) of 
Factor VIII (n=4) while the cells transfected with the native Factor VIII gene 
expressed 33.5 ± 0.7 ng/ml (equivalent ng/ml non-deleted Factor VIII) of . , 
Factor Vni (n=2). ,', . 

The following template oligonucleotides were used for construction 
of the synthetic Factor VIII gene. . , 

i 

,K 

i 

Pl-bbs-l-for (gcta) 



1 0 cgc gaa ttc gga aga cce get age cgc cac 1 r 1 
ccg ccg eta eta cct ggg cgc cgt gga get 
gtc ctg gga eta cat gca gag cga cct ggg 
cga get ccc cgt gga (SEQ ID NO:51) 

ggg gat cct cac gtc tea ggt ttt ctt gta 1 bam 
15 cac cac get ggt gtt gaa ggg gaa get ctt 

ggg cac gcg ggg ggg gaa gcg ggc gtc cac 
ggg gag etc gec ca (SEQ ID NO:52) 

rl bbs 2 for (aacc) 



cgc gaa ttc gga aga ccc aac cct gtt cgt 2 rl 
2 o gga gtt cac cga cca cct gtt caa cat tgc 
caa gec gcg ccc ccc ctg gat ggg cct get 
ggg ccc cac cat cca (SEQ ID NO:53) 
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! ggg g at cct cac S* 0 tca SXS ca g g ct g ac 2 bam 
ggg gtg get ggc cat gtt ctt cag ggt gat 
cac cac ggt gtc gta cac etc ggc ctg gat 
ggt ggg gec cag ca (SEQ ID NO: 54) 

I 

i i 1 

5 rl bbs 3 for (gcac) 

cgc gaa ttc gga aga ccc gca cgc cgt ggg 3 rl 
cgt gag eta ctg gaa ggc cag cga ggg cgc 
cga gta cga cga cca gac gtc cca gcg cga 
gaa gga gga cga caa (SEQ ID NO:55) 

1 0 1 ggg gat cct cac gtc tca gct ggc cat agg 3 bam 

i. 

gec gtt etc ctt aag cac ctg cca cac gta 
ggt gtg gct ccc ccc egg gaa cac ctt gtc 
gtc etc ctt etc gc (SEQ ID NO:56) 

rl bbs 4 for (cage) 
1 5 cgc gaa ttc gga aga ccc cag cga ccc cct 4 rl 
gtg cct gac eta cag eta cct gag cca cgt 
gga cct ggt gaa gga tct gaa cag egg gct 
gat egg cgc cct gct (SEQ ID NO:57) 

ggg gat cct cac gtc tca gaa cag cag gat 4 bam 
2 o gaa ctt gtg cag ggt ctg ggt ttt etc ctt 

ggc cag gct gee etc gcg aca cac cag cag 
ggc gec gat cag cc (SEQ ID NO:58) 
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rl'bbs 5for(gttc) 1 ' ' . • 

' ' i 

cgcgaattc ( ggaagac'ccgttcgcfcgtgtt 5 rl • » , 

cga cga ggg gaa gag ctg ,gca cag cga gac 1 ( / » 

taa gaa cag cct gat gca gga ccg cga cgc ( 
5 cgc cag cgc,ccg cgc (SEQ IDNO:59) ( • , , 

ggg gat cct cac gtc tea gtg gca gec gat 5 baip , 

cag gec ggg cag get gcg gtt cac gta gec > 1 

it 

gtt aac ggt gtg cat cttggg cca ggc gcg 
-gge-get-ggc-gge-gt (SEQ ID NO:60) 

i 

l ■ • 

10 rl bbs 6for(ccac) 

cgc gaa ttc gga aga ccc cca ccg caa gag 6 rl 
cgt gta ctg gca cgt cat egg cat ggg cac 
cac ccc tga ggt gca cag cat ctt cct gga 
ggg cca cac ctt cct (SEQ ID NO:61) 



1 5 ggg gat cct cac gtc tea cag ggt ctg ggc 6 bam 
agt cag gaa ggt gat ggg get gat etc cag 
get ggc ctg gcg gtg gtt gcg cac cag gaa 
ggt gtg gee etc ca (SEQ ID NO:62 ) 

rl bbs 7 for (cctg) 
2 o cgc gaa ttc gga aga ccc cct get gat gga 7 rl 
cct agg cca gtt cct get gtt ctg cca cat 
cag cag cca cca gca cga egg cat gga ggc 
tta cgt gaa ggt gga (SEQ ID NO:63) 
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ggg gat cct cac gtc tea gtc gtc gtc gta 7 bam 
gtc etc ggp etc etc gtt gtt.ctt cat gcg 
cag ctg ggg etc etc ggg gca get gtc cac ' , 

ctt cac gta age ct (SEQ ID NO:64) . , . 

i ■ 

i ' . 

5 , rl bbs 8for(cgac) , 

cgc gaa ttc gga aga ccc cga cct gac cga 8 rl, 
cag cga gat gga tgt cgt acg ctt cga cga 
cga caa cag ccc, cag ctt eat cca gat ccg 
cag cgt ggc caa gaa (SE^ ID NO:65) 

10 ggg gat cct cac gtc tea tac tag egg ggc 8 bam 
gta gtc cca gtc etc etc etc ggc ggc gat 
gta gtg cac cca ggt ctt agg gtg ctt ctt 
ggc cac get gcg ga (SEQ ID NO:66) 

rl bbs 9 for (agta) 
1 5 cgc gaa ttc gga aga ccc agt act ggc ccc 9 r 1 
cga cga ccg cag eta caa gag cca gta cct 
gaa caa egg ccc cca gcg cat egg ccg caa 
gta caa gaa ggt gcg (SEQ ID NO:67) 



ggg gat cct cac gtc tea gag gat gee gga 9 bam 
2 o etc gtg ctg gat ggc etc gcg ggt ctt gaa 
agt etc gtc ggt gta ggc cat gaa gcg cac 
ctt ctt gta ctt gc (SEQ ID NO:68) 
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i rl bbs 10 for (cctc) ' ., ' 

cgc gaa tto gga aga ccc cct egg ccc cct 10 rl . 

get gta egg cga ggt ggg cga cac cct get 

gat cat ctt caa gaa cca ggc cag cag gec 

5 eta caa cat eta ccc (SEQ ID NO:69) 1 

ggg gat cct cac gtc tea ctt cag gtg ctt 10 bam 
cac gec ctt ggg cag gcg gcg get gta cag 
ggg gcg cac gtc ggt gat gec gtg ggg gta 
gat gtt gta ggg cc (SEQ ID NO:70) 

i 

I 

10 t rl bbs 11 for(gaag) 

cgc gaa ttc gga aga ccc gaa gga ctt ccc 1 1 rl 
cat cct gec egg cga gat ctt caa gta caa 
gtg gac cgt gac cgt gga gga egg ccc cac 
caa gag cga ccc ccg (SEQ ID NO:71) 

1 5 ggg g a t cct cac gtc tea gec gat cag tec 1 1 bam 
gga ggc cag gtc gcg etc cat gtt cac gaa 
get get gta gta gcg ggt cag gca gcg ggg 

gtc get ctt ggt gg (SEQ ID NO:72) 



rl bbs 12 for (egge) 
2 o cgc gaa ttc gga aga ccc egg ccc cct get 12 rl 
gat ctg eta caa gga gag cgt gga cca gcg 
egg caa cca gat cat gag cga caa gcg caa 
cgt gat cct gtt cag (SEQ ID NO:73) 
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1 ggg gat cct cac gtc tea age ggg gtt ggg 1 2 bam 
cag gaa gcg ctg gat gtt etc ggt cag ata 

cca get gcg gtt etc gtc gaa cac get gaa 
cag gat cac gtt gc (SEQ ID NO:74) 

5 rl bbs 13 for(cgct) 

cgc gaa ttc gga aga ccc cgc tgg cgt gca 13 rl 
get gga aga tec cga gtt cca ggc cag caa 
cat cat gca cag cat caa egg eta cgt gtt 
cga cag cct gca get (SEQ ID NO:75) 

1° ggg g at cct cac &° tca ca g § aa s tc sgt 13 ^ am 
ctg ggc gee gat get cag gat gta cca gta 
ggc cac etc atg cag gca cac get cag ctg 
cag get gtc gaa ca (SEQ ID NO:76) 



PCT/US97/1609 



rl bbs Hfor(cctg) 
15 cgc gaa ttc gga aga ccc cct gag cgt gtt 14 rl 
ctt etc egg gta tac ctt caa gca caa gat 
ggt gta cga gga cac cct gac cct gtt ccc 
ctt etc egg cga gac (SEQ ID NO:77) 

ggg gat cct cac gtc tca gtt gcg gaa gtc 14 bam 
2 0 get gtt gtg gca gee cag aat cca cag gee 
ggg gtt etc cat aga cat gaa cac agt etc 
gee gga gaa ggg ga (SEQ ID NO:78) 
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1 1,1 1 

rl bbs 15' for (caac) ' 1 1 ■ ■ ' 

cgc gaa ttc gga aga ccc caa ccg cgg cat, 15 irl , , ■ 

gac tgc cct get gaa agt etc cag ctg cga ' , , 

caa gaa cac egg cga eta eta cga gga cag 

eta cga gga cat etc (SEQ, ID NO:79)' , 



ggg gat cct cac gtc tea gcg' gtg gcg gga 15 bam 
gtt ttg gga gaa gga gcg ggg etc gat ggc , , 1 
gtt gtt ctt gga cag cag gta "ggc gga gat 
gtc ctc.gta.gct.gt.(.SEQ jD NO:80) 

i • 

10 rl bbs 16 for(ccgc) 

cgc gaa ttc gga aga ccc peg cag cac gcg 16 rl 
tea gaa gca gtt caa cgc cac ccc ccc cgt 
get gaa gcg cca cca gcg cga gat cac ccg 
cac cac cct gca aag (SEQ ID NO:81) 

1 5 ggg gat cct cac gtc tea gat gtc gaa gtc 1 6 bam 
etc ctt ctt cat etc cac get gat ggt gtc 
gtc gta gtc gat etc etc ctg gtc get ttg 
cag ggt ggt gcg gg (SEQ ID NO:82) 



rl bbs 17 for (catc) 
2 0 cgc gaa ttc gga aga ccc cat eta cga cga 17 rl 
gga cga gaa cca gag ccc ccg etc ctt cca 
aaa gaa aac ccg cca eta ctt cat cgc cgc 
cgt gga gcg cct gtg (SEQ ID NO:83) 
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ggg gat cct cac gtc tea ctg ggg cac get 17 bam 
gee get ctg ggc gcg gtt gcg cag gac gtg 
ggg get get get cat gee gta gtc cca cag 
gcgctccacggc,gg(SEQIDNO:84) 

1 ■ ■ .'' . 

5 rl bbs 18 for (ccag) 

cgc gaa ttc gga aga ccc cca gtt caa gaa 18 rl t 

ggt ggt gtt cca gga gtt cac cga egg cag • 1 

ii 

ctt cac cca gee cct gta ccg egg cga get 
gaa cga gca c<it ggg (SEQ ID NO:85) 

i • 

i 

1 0 ggg g at cct cac gt° tca gg c tt g gtt g°g '1 8 bam 
gaa ggt cac cat gat gtt gtc etc cac etc 
ggc gcg gat gta ggg gee gag cag gec cag 
gtg etc gtt cag ct (SEQ ID NO:86) 

rl bbs 19 for (agee) 
15 cgc gaa ttc gga aga ccc age etc ccg gee 19 rl 
eta etc ctt eta etc etc cct gat cag eta 
cga gga gga cca gcg cca ggg cgc cga gec 
ccg caa gaa ctt cgt (SEQ ID NO:87) 

ggg gat cct cac gtc tca etc gtc ctt ggt 19 bam 
2 0 ggg gg c cat gtg gtg ctg cac ctt cca gaa 
gta ggt ctt agt etc gtt ggg ctt cac gaa 
gtt ctt gcg ggg ct (SEQ ID NO: 88) 
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1 rl bbs 20 for (cgag) 

1 i 

cgc gaa ttc gga aga ccc cga gtt cga ctg 
caa ggc ctg ggc eta ctt cag cga cgt gga 
cct, gga gaa gga cgt gca cag egg cct gat 
5 cggccccctgctggt(SEQIDNO:89) 

< 

ggg gat cct cac gtc tea gaa cag ggc aaa 
ttc ctg cac agt cac ctg cct ccc gtg ggg 
ggg gtt cag ggt gtt ggt gtg gca cac cag 
cag ggg gec gat ca (SEQ ID NO:90) 

10 ' rl bbs 21 for(gttc) 

I. 

cgc gaa ttc gga aga ccc gtt ctt cac cat 
ctt cga cga gac taa gag ctg gta ctt cac 
cga gaa cat gga gcg caa ctg ccg cgc ccc 
ctg caa cat cca gat (SEQ ID NO:91) 

15 ggg gat cct cac gtc tea cag ggt gtc cat 

gat gta gec gtt gat ggc gtg gaa gcg gta 

gtt etc ctt gaa ggt ggg ate ttc cat ctg 
__gatgtt gca ggg gg ( SEQ ID NO:92 ) 

rl bbs 22 for (cctg) 
2 0 cgc gaa ttc gga aga ccc cct gee egg cct 22 rl 
ggt gat ggc cca gga cca gcg cat ccg ctg 
gta cct get gtc tat ggg cag caa cga gaa 
cat cca cag cat cca (SEQ ID NO:93) 
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20 bam 



i 



21 rl 



i 

21 bam 



WO 98/12207 1 , 

ggg gat cct cac gtc tea gta cag gtt gta 22 bam 

1 i , i * * ' 

cag ggc cat ctt gta etc etc, ctt ctt gcg 
cac ggt gaa aac gtg gee get gaa gtg gat 

get gtg gat gtt ct (SEQ ID NO:94) ( 

' ' . ■ ' ' ■' 

rl bbs 23 for (gtac) 

cgc gaa ttc gga aga ccc gta ccc egg cgt 23 rl 

gtt cga gac tgt gga gat get gee cag caa 

)i 

ggc egg gat ctg gcg cgt gga gtg cct gat 

,i , ■ 

egg cga gcaxct gca-(SEQ ID NO:95) 



PCT/US97/16d39 



i 



10 ggg gat cct cac gtc tea get ggc cat gee 23 bam 
cag ggg ggt ctg gca ctt gtt get gta cac 
cag gaa cag ggt get cat gee ggc gtg cag 
gtg etc gee gat ca (SEQ ID NO:96) 

rl bbs 24 for (cage) 
1 5 cgc gaa ttc gga aga ccc cag egg cca cat 24 r 1 
ccg cga ctt cca gat cac cgc cag egg cca 
gta egg cca gtg ggc tec caa get ggc ccg 
cct gca eta cag egg (SEQ ID NO:97) 

ggg gat cct cac gtc tea cat ggg ggc cag 24 bam 
2 0 cag gtc cac ctt gat cca gga gaa ggg etc 
ctt ggt cga cca ggc gtt gat get gee get 
gta gtg cag gcg gg (SEQ ID NO:98) 
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1 • ' ' ' ' ' ' 
i rl bbs 25 for (catg) ' 

cgc gaa tte gga aga ccc cat gat cat cca 25 rl • , 
egg cat caa gac cca ggg cgc ccg cca gaa 
gtt cag cag cct gta cat cag cca gtt cat 
5 cat cat gta etc tct (SEQ ID NO:99) 

ggg gat cct cac gtc tea gtt gee gaa gaa 25 bam 
' cac cat cag ggt gee ggt get gtt gee gcg 
gta ggt ctg cca ctt ctt gec gtc tag aga 
gta cat gat gat ga (SEQ ID NO:100) 

10 . rl bbs 26 for (caac) 

cgc gaa ttc gga aga ccc caa cgt gga cag 26 rl 
cag egg cat caa gca caa cat ctt caa ccc 
ccc cat cat cgc ccg eta cat ccg cct gca 
ccc cac cca eta cag (SEQ ID NO: 101) 

15 ggg gat cct cac gtc tea gee cag ggg cat 26 bam 
get gca get gtt cag gtc gca gee cat cag 
etc cat gcg cag ggt get gcg gat get gta 

gtg ggt ggg gtg ca (SEQ ID NO: 102) 

rl bbs 27 for (gggc) 

2 0 cgc gaa ttc gga aga ccc ggg cat gga gag 27 rl 

caa ggc cat cag cga cgc cca gat cac cgc 
etc cag eta ctt cac caa cat gtt cgc cac 
ctg gag ccc cag caa (SEQ ID NO: 103) 
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i • 

' ' t 

1 ggg gat cct cac gtc tea cca etc ctt ggg 27 bam 

i 

gtt gtt cac ctg ggg gcg cca ggc gtt get 
gcg gec ctg cag gtg cag gcg ggc ctt get 
ggg get cca ggt gg (SEQ ID NO: 104) 

I 

1 f i 

5 rl bbs 28 for (gtgg) 

cgc gaa ttc gga aga ccc gtg get gca ggt 28 rl 
gga ctt cca gaa aac cat gaa ggt gac tgg 
cgt gac cac cca ggg cgt caa gag cct get 
gac cag cat gta cgt~(SEQ ID NO: 105) 

10 ' ggg gat cct cac gtc tea ctt gec gtt ttg 28 bam 

i 

gaa gaa cag ggt cca ctg gtg gec gtc ctg 
get get get gat cag gaa etc ctt cac gta 
cat get ggt cag ca (SEQ ID NO: 106) 

rl bbs 29 for (caag) 
1 5 cgc gaa ttc gga aga ccc caa ggt gaa ggt 29 rl 
gtt cca ggg caa cca gga cag ctt cac ace 
ggt cgt gaa cag cct gga ccc ccc cct get 
gac ccg eta cct gcg (SEQ ID NO: 107) 

ggg gat cct cac gtc tea gcg gee get tea 29 bam 
2 0 gta cag gtc ctg ggc etc gca gee cag cac 
etc cat gcg cag ggc gat ctg gtg cac cca 
get ctg ggg gtg gat gcg cag gta gcg ggt 
cag ca (SEQ ID NO: 108) 
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The codon usage for the native and synthetic genes described above 
are presented in Tables 3 and 4; respectively: , 



TABLE 3: , , Godon Frequency of the Synthetic Factor 
VIII B Domain Deleted Gene , 



5 AA Codon Number /1000 Fraction 



10 



15 



20 



25 



30 



35 



Gly 


GGG 


7.00 


4.82 


0.09 


Gly 


GGA 


1 1.00 


0.69 


0.01 


Gly 


GGT 


0.00 


0:00 


0.00 


Gly 


GGC 


74:00 


50.93 


0.90 


Olu 


GAG 


81.00 


55.75 


0.96 


Glu 


GAA 


3.00 


2.06 


0.04 


Asp 


GAT 


4.00 


2.75 


0.05 


Asp 


GAC 


78.00 


53.68 


0.95 


Val 


GTG 


77.00 


52.99 


0.88 


Val 


GTA 


2.00 


1.38 


0.02 


val 


GTT 


2.00 


1.38 


0.02 


Val 


GTC 


7.00 


4.82 


0.08 


Ala 


GCG 


0.00 


0.00 


0.00 


Ala 


GCA 


0.00 


0.00 


0.00 


Ala 


GCT 


3.00 


2.06 


0.04 


Ala 


GCC 


67.00 


46.11 


0.96 


Arg 


AGG 


2.0.0 


1.38 


0.03 


Arg 


AGA 


0.00 


0.00 


0.00 


Ser 


AGT 


0.00 


0.00 


0.00 


Ser 


AGC 


97.00 


66.76 


0.81 


Lys 


AAG 


75.00 


51.62 


0.94 


Lys 


AAA 


5.00 


3.44 


0.06 


Asn 


AAT 


0.00 


0.00 


0.00 


Asn 


AAC 


63.00 


43.36 


1.00 



- 64 - 



BNSDOCID: <WO__9812207A1 JA> 



WO 98/12207 . 1 FCT/US97/16639 





Met 


ATG 


A 1 A A 

43.00 


29.59 

i 


1 AA 

1.00 




lie 


A TP A 

ATA 


A A A ■ 

0.00. 


A A A 

0.00 ( • 


r\ AA 

0.00 




lie 


ATT 


2.00 


1.38 


0.03 


5 


He 


ATC 


72.00 

, i 1 


49.55 


0.97 




Thr 


AC^ 


O A A 

2.00 


1 "I O 

, 1-38 


A AO 

0.02 




lnr 


A t~* A 

ACA 


1 AA 
1.00 


i A i£A 


A A1 
0.01 




Thr 


ACT 


10.00 


6.88 


0.12 


10 


Thr 


ACC 


70.00 


48.18 


0.84 




Trp 


TGG 


a a a 

,28.00 


19.27 


1 f\f\ 
1.00 




End 


TGA 


1 A A 

1.00 


n 

a ^ r\ 

i.0.69 


1 f\f\ 

1.00 




Cys 


TGT . 


1.00 


0.69 


0.05 


15 


CysT 


~ TGC "~ 


"T8T00" 


12.39 


0.95 




End 


TAG 


0.00 


0.00 


0.00 




End 


TAA 


. 0.00 


0.00 


A A A 

0.00 




Tyr 


TAT 


2.00 


,1.38 


0.03 


20 


Tyr 


TAC 


66.00 


45.42 


0.97 




Leu 


TTG 


0.00 


0.00 


0.00 




Leu 


f ■ WW A 

TTA 


0.00 


0.00 


0.00 




Phe 


TTT 


1.00 


0.69 


0.01 


25 


Phe 


TTC 


76.00 


52.31 


0.99 




ber 


TCG 


1 AA 

1.00 


0.69 


A A 1 

0.01 




Ser 


TCA 


A A A 

0.00 


A A A 

0.00 


A A A 

0.00 




Ser 


TCT 


3.00 


2.06 


0.03 


30 


Ser 


TCC 


19.00 


13.08 


0.16 




Arg 


CGG 


l.OO 


0.69 


A A t 

0.01 




Arg 


CGA 


0.00 


0.00 


0.00 




Arg 


CGT 


l.OO 


0.69 


0.01 


35 


Arg 


CGC 


69.00 


47.49 


0.95 




Gin 


CAG 


62.00 


42.67 


0.93 




Gin 


CAA 


5.00 


3.44 


0.07 




His 


CAT 


l.OO 


0.69 


0.02 


40 


His 


CAC 


50.00 


34.41 


0.98 
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Leu 


CTG 


118.00 


81.21' 


0.94 


Leu 


CTA 


3.00 


2.06 . 


0.02 


Leu 


LI 1 






n rii 

\J.\JL 


5 Leu 


CTC 


3.00 


2.06 


0.02 


Pro 


CCG 


4.00 


2.75 


0.05' 


Pro 


CCA 


0.00 


0.00 


0.00 


Pro 


CCT 


3.00 


2.06 


0.04 


10 Pro 


CCC 


68.00 


46.80 


0.91 



TABLE 4: Codon Frequency Table of the Native Factor 

VIII B Domain Deleted Gene 

15 ' AA Codon Number /1000 Fraction 



20 



25 



30 



Gly 


GGG 


12.00 


8.26 


0.15 


Gly 


GGA 


34.00 


23.40 


0.41 


Gly 


GGT 


16.00 


11.01 


0.20 


Gly 


GGC 


20.00 


13.76 


0.24 


Glu 


GAG 


33.00 


22.71 


0.39 


Glu 


GAA 


51.00 


35.10 


0.61 


Asp 


GAT 


55.00 


37.85 


0.67 


Asp 


GAC 


27.00 


18.58 


0.33 


Val 


GTG 


29.00 


19.96 


0.33 


Val 


GTA 


19700 


13:08 


0T22" 


Val 


GTT 


17.00 


11.70 


0.19 


Val 


GTC 


23.00 


15.83 


0.26 



Ala 


GCG 


2.00 


1.38 


0.03 


Ala 


GCA 


18.00 


12.39 


0.25 


Ala 


GCT 


31.00 


21.34 


0.44 


35 Ala 


GCC 


20.00 


13.76 


0.28 
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5 



10 



15 



20 



25 



30 



35 



Arg 


AGG 


18.00 


Arg 


AGA 


22.00 


Ser 


AGT 


22.00 


Ser 


AGC 


24.00 


Lys 


AAG 


32.00 


Lys 


AAA 


48.00 


Asn 


AAT 


38.00 


Asn 


AAC 


25.00 


Met 


ATG 


43.00 


lie 


ATA 


13.00 


lie 


ATT 


36.00 


He 


ATC 


25.00 


Thr 


ACG 


1.00 


Thr 


ACA 


23.00 


Thr 


ACT 


36.00 


Thr 


ACC 


23.00 


Trp 


TGG 


28.00 


End 


TGA 


1.00 


Cys 


TGT 


7.00 


Cys 


TGC 


12.00 


End 


TAG 


0.00 


End 


TAA 


0.00 


Tyr 


TAT 


41.00 


Tyr 


TAC 


27.00 


Leu 


TTG 


20.00 


Leu 


TTA 


10.00 


Phe 


TTT 


45.00 


Phe 


TTC 


32.00 


Ser 


TCG 


2.00 


Ser 


TCA 


27.00 


Ser 


TCT 


27.00 


Ser 


TCC 


18.00 



, PCT/US97/16639 

1 1 1 1 1 -I 1 

\ 

1239 0.25 , 

15.14 0.30 

15.14' 0.18 1 , ' 

16.52 6.20 

22.02 0.40 
33.04 0.60 

26.15 0.60 
17.21 0.40 

29.59 1.00 

8.95 0.18 

24.78 0.49 

17.21 0.34 

0.69 0.01 
15.83 0.28 
24.78 0.43 
15.83 0.28 

19.27 1.00 

0.69 1.00 

4.82 0.37 

8.26 0.63 

0.00 0.00 

0.00 0.00 
28.22 0.60 
18.58 0.40 

13.76 0.16 

6.88 0.08 

30.97 0.58 
22.02 0.42 

1.38 0.02 
18.58 0.22 

18.58 0.22 

12.39 0.15 
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5 



10 



15 



Arg 


CGG 


' 6,00 


4.13 


0.Q8 


Arg 


CGA 


10.00 


6.88 


0.14 


Arg 


CGT 


7.00 


4.82 .' 


0.10 


Arg 


CGC 


10.00 


6.88 

1 


0.14 


Gin 


CAG 


,'42.00 


28.91 


0.63 


Gin 


CAA 


25.00. 


17.21 


0.37 


His 


CAT ' 


28.00 


19.27 


0.55 


His 


CAC 


23.00 


15.83 


0.45 


Leu 


CTG 


36.00 


24.78 


0.29 


Leu 


CTA 


,15.00 


' 10.32 


0.12 


Leu 


CTT 


24.00 


t l"6.52 


0.19 


Leu 


CTC 


20.00 


13.76 


0.16 


Pro 


CCG 


1.00, 


6.69 


0.01 


Pro 


CCA 


32.00 


22.02 


0.43 


Pro 


CCT 


26.00 


17.89 


0.35 


Pro 


CCC 


15.00 


10.32 


0.20 



2 0 . 

Use 

The synthetic genes of the invention are useful for expressing the a 
protein normally expressed in mammalian cells in cell culture (e.g. for 
commercial production of human proteins such as hGH, TPA, Factor VIII, and 
25 Factor IX). The synthetic genes of the invention are also useful for gene 
therapy. For example, a synthetic gene encoding a selected protein can be 
introduced in to a cell which can express the protein to create a cell which can 
be administered to a patient in need of the protein. Such cell-based gene 
therapy techniques are well known to those skilled in the art, see, e.g., 

3 0 Anderson, et al., U.S. Patent No. 5,399,349; Mulligan and Wilson, U.S. Patent 

No. 5,460,959. 



What is claimed is: 
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I 

1 1 . A synthetic gene encoding a protein normally expressed in an 

) ■ . t 

eukaryotic cell wherein at least one non-preferred or less preferred codon in a 

natural gene encoding said protein has been replaced by a preferred codon 

encoding the same amino acid, said synthetic gene being capable of expressing 

5 said protein at a level which is at least 1 10% of that expressed by said natural 

gene in an in vitro mammalian cell culture system under identical conditions. 

2. The synthetic gene of claim 1 wherein said synthetic gene is 
capable of expressing said protein at a level which is at least 150% of that 

expressed by said natural gene in an in vitro cell culture system under identical 

i 

10 conditions. 

i 

3. The synthetic gene of claim 1 wherein said synthetic gene is 
capable of expressing said protein at a level which is at least 200% of that 
expressed by said natural gene in an in vitro cell culture system under identical 
conditions. 

i 

15 4. The synthetic gene of claim 1 wherein said synthetic gene is 

capable of expressing said protein at a level which is at least 500% of that 
expressed by said natural gene in an in vitro cell culture system under identical 
conditions. 

5. The synthetic gene of claim 1 wherein said synthetic gene 
2 0 comprises fewer than 5 occurrences of the sequence CG. 

6. The synthetic gene of claim 1 wherein at least 10% of the codons 
in said natural gene are non-preferred codons. 
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' ' . ' . . ' 1 ' 

7, The synthetic gene of claim 1 wlierein at least 50% of the codons 

1 1 • i • * * 

in said natural gene are non-preferred codons. , 

8; The synthetic gene of claim 1 wherein at least 50% of the non- 
preferred codons and less, preferred 6odoi>s present in said natural gene have 

1 .. * . . : * ■ , i • i i 

5 been replaced by preferred codons. ' , 

• . • i i 

i f 
9. The synthetic gene of claim 1 wherein at least 90% of the non- 

preferred codons and less preferred codons present in said natural gene have 

-beenreplacedby-preferredcodons.- _ — — 



10. The synthetic gene of claim 1 wherein said protein is normally 
1 0 expressed by a mammalian cell. 

1 1 . The synthetic gene of claim 1 wherein said protein is a retroviral 

protein. 

12. The synthetic gene of claim 1 wherein said protein is a lentiviral 

protein. 

15 13. The synthetic gene of claim 1 1 wherein said protein is an HIV 



protein. 

14. The synthetic gene of claim 13 wherein said protein is selected 
from the group consisting of gag, pol, and env. 

15. The synthetic gene of claim 13 wherein said protein is gpl20. 
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16. Thesynthetic gene of claim 13 wherein said protein is gpl60. 

1 i , ( • j 

■ | , i i. 

• , 1 ■ 1 t | 

17. The synthetic gene of claim 1 wherfein said protein is a human 

i . 1 1 1 • 

protein. , 



18. The synthetic gene of claim 1 wherein said human protein is 
5 Factor VIII. 



i 



19. The synthetic gene of claim 1 wherein 20% of the codons are 
preferred codons. 

■ . i 

20. The synthetic gene of claim 18 wherein said gene has the 
coding sequence present in SEQ ID NO:42. 

10 21. The synthetic gene of claim 1 wherein said protein is green 

fluorescent protein. 

22. The synthetic gene of claim 20 wherein said synthetic gene is 
capable of expressing said green fluorescent protein at a level which is at least 
200% of that expressed by said natural gene in an in vitro mammalian cell 

15 culture system under identical conditions. 

23. The synthetic gene of claim 20 wherein said synthetic gene is 
capable of expressing said green fluorescent protein at a level which is at least 
1000% of that expressed by said n atural gene in an in vitro mammalian cell 
culture system under identical conditions. 
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1 24. The synthetic gene of claim 21 having the sequence depicted 

in Figure 11 (SEQ ID NO:40). , " ' 

25. An expression vector comprising the synthetic gene of 

i 

claim 1. 

t 1 ! 

5 26. The expression vector of claim 21, said expression vector 

being a mammalian expression vector. 

27. A mammalian cell harboring with the synthetic gene of 

claim 1. 

28. A method for preparing a synthetic gene encoding a protein 

10 normally expressed by mammalian cells, comprising identifying non-preferred 
and less-preferred codons in the natural gene encoding said protein and 
replacing one or more of said non-preferred and less-preferred codons with a 
preferred codon encoding the same amino acid as the replaced codon. 
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i 1/18 , ' 

1 i 

■ 1 . ," 

Syngpl20mn f 



1 


CTCGAGATCC 


ATTGTGCTCT 


AAAGGAGATA 


CCCGGCCAGA 


CACCCTCACC 


51 


TGCGGTGCCC 


AGCTGCCCAG 


GCTGAGGCAA 


GAGAAGGCCA 


GAAACCATGC 


101 


CCATGGGGTC 

t 


TCTGCAACCG 


CTGGCCACCT 


TGTACCTGCT 


GGGGATGCTG 


151 


GTCGCTTCCG 


TGCTAGCCAC 


CGAGAAGCTG 


TGGGTGACCG 


TGTACTACGG 


201 


CGTGCCCGTG 


TGGAAGGAGG 


CCACCACCAC 


CCTGTTCTGC 


GCGAGCGACG 


251 


CCAAGGCGTA 


CGACACCGAG 


GTGCACAACG 


TGTGGGCCAC 


CCAGGCGTGC 


301 


GTGCCCACCG 


ACCCCAACCC 


CCAGGAGGTG 


GAGCTCGTGA 


ACGTGACCGA 


351 


GAACTTGAAC 

mm m* ^^mTmmTm»^& 


ATGTGGAAGA ACAACATGGT 


GGAGCAGATG 


CATGAGGACA 


401 


TCATCAGCCT 


GTGGGACCAG 


AGCCTGAAGC 


CCTGCGTGAA 


GCTGACCCCC 


451 


CTGTGCGTGA 


CCCTGAACTG 


CACCGACCTG 


AGGAACACCA 


CCAACACCAA 


501 


CAAC AGCAC C 


GCCAACAACA 


ACAGCAACAG 


CGAGGGCACC 


ATCAAGGGCG 


551 

*J +J 


GCGAGATGAA 

i 


GAACTGCAGC 


TTCAACATCA 


CCACCAGCAT 




601 


ATGCAGAAGG 


AGTACGCCCT 


GCTGTACAAG 


CTGGATATCG 


TGAGCATCGA 


651 


CAACGACAGC 


ACCAGCTACC 


GCCTGATCTC 


CTGCAACACC 


AGCGTGATCA 

****** xJ iU 


701 


CCCAGGCCTG 


CCCCAAGATC 


AGCTTCGAGC 


CCATCCCCAT 


CCACTACTGC 


751 


GCCCCCGCCG 


GCTTCGCCAT 


CCTGAAGTGC 


AACGACAAGA 


AGTTCAGCGG 


801 


CAAGGGCAGC 


TGCAAGAACG 


TGAGCACCGT 


GCAGTGCACC 


CACGGCATCC 


851 


GGCCGGTGGT 


GAGCACCCAG 


CTCCTGCTGA 


ACGGCAGCCT 


GGCCGAGGAG 


901 


GAGGTGGTGA 


TCCGCAGCGA 


GAACTTCACC 


GACAACGCCA 


AGACCATCAT 


951 


CGTGCACCTG 


AATGAGAGCG 


TGCAGATCAA 


CTGCACGCGT 


CCCAACTACA 


1001 


ACAAGCGCAA 


GCGCATCCAC 


ATCGGCCCCG 


GGCGCGCCTT 


CTACACCACC 


1051 


AAGAACATCA 


TCGGCACCAT 


CCGCCAGGCC 


CACTGCAACA 


TCTCTAGAGC 


1101 

«L J» \i J. 




GACACCCTGC 


GCCAGATCGT 






1151 


TCAAGAACAA 


GACCATCGTG 


TTCAACCAGA 


GCAGCGGCGG 


CGACCCCGAG 


1201 


ATCGTGATGC 


ACAGCTTCAA 


CTGCGGCGGC 


GAATTCTTCT 


ACTGCAACAC 


1251 


CAGCCCCCTG 


TTCAACAGCA 


CCTGGAACGG 


CAACAACACC 


TGGAACAACA 


1301 


CCACCGGCAG 


CAACAACAAT 


ATTACCCTCC 


AGTGCAAGAT 


CAAGCAGATC 


1351 


ATCAACATGT 


GGCAGGAGGT 


GGGCAAGGCC 


ATGTACGCCC 


CCCCCATCGA 


1401 


GGGCCAGATC 


CGGTGCAGCA 


GCAACATCAC 


CGGTCTGCTG 


CTGACCCGCG 


1451 


ACGGCGGCAA 


GGACACCGAC 


ACCAACGACA 


CCGAAATCTT 


CCGCCCCGGC 
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1501 GGCGGCGACA TGCGCGACAA CTGGAGATCT GAGCTGTACA AGTACAAGGT 
1551 GGTGACGATC GAGCCCCTGG GCGTGGCCCC CACCAAGGCC AAGfCGCCGCG 
1601 TGGTGCAGCG CGAGAAGCGC TAAAGCGGCC GC (SEQ ID NO: 34) 
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1 ACCGAGAAGC TGTGGGTGAC CGTGTACTAC GGCGTGCCCG TGTGGAAGGA 

51 GGCCACCACC ACCCTGTTCT GCGCCAGCGA CGCCAAGGCG TACGACACCG 

i , 1 . ■ 1 . > i I 

101 AGGTGCACAA CGTGTGGGCC ACCCAGGCGT GCGTGCCCAC CGACCCCAAC 

i 'i 

151 CCCCAGGAGG TGGAGCTCGT GAACGTGACC GAGAACTTCA ACATGTGGAA 

i » 

201 GAACAACATG GTGGAGCAGA TGCATGAGGA CATCATCAGC , CTGTGGGACC 
251 AGAGCCTGAA GCCCTGCGTG AAGCTGACCC CCCTGTGCGT GACCCTGAAC 
301 TGCACC6ACC TGAGGAACAC CACCAACACC AACAACAGCA CCGCCAACAA 

1 i 

351 CAACAGCAAC AGCGAGGGCA CCATCAAGGG CGGCGAGATG AAGAACTGCA 

401 GCTTCAACAT CACCACCAGC ATCCGCGACA AGATGCAGAA GGAGTACGCC 

i ' 

451 CTGCTGTACA AGCTGGATAT CGTGAGCATC GACAACGACA GCACCAGCTA 
501 CCGCCTGATC TCCTGCAACA CCAGCGTGAT CACCCAGGCC TGCCCCAAGA 
551 TCAGCTTCGA GCCCATCCCC ATCCACTACT GCGCCCCCGC CGGCTTCGCC 
601 ATCCTGAAGT GCAACGACAA GAAGTTCAGC GGCAAGGGCA GCTGCAAGAA 
651 CGTGAGCACC GTGCAGTGCA CCCACGGCAT CCGGCCGGTG GTGAGCACCC 
701 AGCTCCTGCT GAACGGCAGC CTGGCCGAGG AGGAGGTGGT GATCCGCAGC 
751 GAGAACTTCA CCGACAACGC CAAGACCATC ATCGTGCACC TGAATGAGAG 
801 CGTGCAGATC AACTGCACGC GTCCCAACTA CAACAAGCGC AAGCGCATCC 
851 ACATCGGCCC CGGGCGCGCC TTCTACACCA CCAAGAACAT CATCGGCACC 
901 ATCCGCCAGG CCCACTGCAA CATCTCTAGA GCCAAGTGGA ACGACACCCT 
951 GCGCCAGATC GTGAGCAAGC TGAAGGAGCA GTTCAAGAAC AAGACCATCG 
1001 TGTTCAACCA GAGCAGCGGC GGCGACCCCG AGATCGTGAT GCACAGCTTC 
1051 AACTGCGGCG GCGAATTCTT CTACTGCAAC ACCAGCCCCC TGTTCAACAG 
1101 CACCTGGAAC GGCAACAACA CCTGGAACAA CACCACCGGC AGCAACAACA 
1151 ATATTACCCT CCAGTGCAAG ATCAAGCAGA TCATCAACAT GTGGCAGGAG 
1201 GTGGGCAAGG CCATGTACGC CCCCCCCATC GAGGGCCAGA TCCGGTGCAG 
1251 CAGCAACATC ACCGGTCTGC TGCTGACCCG CGACGGCGGC AAGGACACCG 
1301 ACACCAACGA CACCGAAATC TTCCGCCCCG GCGGCGGCGA CATGCGCGAC 
1351 AACTGGAGAT CTGAGCTGTA CAAGTACAAG GTGGTGACGA TCGAGCCCCT 
1401 GGGCGTGGCC CCCACCAAGG CCAAGCGCCG CGTGGTGCAG CGCGAGAAGC 
1451 GGGCCGCCAT CGGCGCCCTG TTCCTGGGCT TCCTGGGGGC GGCGGGCAGC 

Fig. 1 C 
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1501 ACCATGGGGG CCGCCAGCGT GACCCTGACC GTGCAGGCCC GCCTGCTCCT 

1551 GAGCGGCATC GTGCAGCAGC AGAACAACCT CCTCCGCGCC ATCGAGGCCC 

i 

1601 AGCAGCATAT GCTCCAGCTC ACCGTGTGGG GCATCAAGCA GCTCCAGGCC 
1651 CGCGTGCTGG CCGTGGAGCG CTACCTGAAG GACCAGCAGC TCCTGGGCTT 
1701 CTGGGGCTGC TCCGGCAAGC TGATCTGCAC CACCACGGTA CCCTGGAACG 
1751 CCTCCTGGAG CAACAAGAGC CTGGACGACA TCTGGAACAA CATGACCTGG 
1801 ATGCAGTGGG AGCGCGAGAT CGATAACTAC ACCAGCCTGA TCTACAGCCT 
1851 GCTGGAGAAG AGCCAGACCC AGCAGGAGAA GAACGAGCAG GAGCTGCTGG 
1901 AGCTGGACAA-GTGGGCGAGC— CTGTGGAACT— GGTTCGACAT CACCAACTGG 
1951 CTGTGGTACA TCAAAATCTT CATCATGATT GTGGGCGGCC TGGTGGGCCT 
2001 CCGCATCGTG TTCGCCGTGC TGAGCATCGT GAACCGCGTG CGCCAGGGCT 
2051 ACAGCCCCCT GAGCCTCCAG ACCCGGCCCC CCGTGCCGCG CGGGCCCGAC 
2101 CGCCCCGAGG GCATCGAGGA GGAGGGCGGC GAGCGCGACC GCGACACCAG 
2151 CGGCAGGCTC GTGCACGGCT TCCTGGCGAT CATCTGGGTC GACCTCCGGA 
2201 GCCTGTTCCT GTTCAGCTAC CACCACCGCG ACCTGCTGCT GATCGCCGCC 
2251 CGCATCGTGG AACTCCTAGG CCGCCGCGGC TGGGAGGTGC TGAAGTACTG 
2301 GTGGAACCTC CTCCAGTATT GGAGCCAGGA GCTGAAGTCC AGCGCCGTGA 
2351 GCCTGCTGAA CGCCACCGCC ATCGCCGTGG CCGAGGGCAC CGACCGCGTG 
2401 ATCGAGGTGC TCCAGAGGGC CGGGAGGGCG ATCCTGCACA TCCCCACCCG 
2451 CATCCGCCAG GGGCTCGAGA GGGCGCTGCT G (SEQ ID NO: 35) 
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1 GAATTCACGC GTAAGCTTGC CGCCACCATG GTGAGCAAGG GCGAGGAGCT 
51 GTTCACCGGG GTGGTGCCCA TCCTGGTCGA GCTGGACGGC GACGTGAACG 
101 GCCACAAGTT . CAGCGTGTCC GGCGAGGGCG AGGGCGATGC CACCTACGGC 
151 AAGCTGACCC TGAAGTTCAT CTGCACCACC GGCAAGCTGC CCGTGCCCTG 
201 GCCCACCCTC GTGACCACCT TCAGCTACGG CGTGCAGTGC TTCAGCCGCT 
251 ACCCCGACCA CATGAAGCAG CACGACTTCT TCAAGTCCGC CATGCCCGAA 
301 GGCTACGTCC AGGAGCGCAC CATCTTCTTC AAGGACGACG GCAACTACAA 
351 GACCCGCGCC GAGGTGAAGT TCGAGGGCGA CACCCTGGTG AACCGCATCG 
401 AGCTGAAGGG CATCGACTTC AAGGAGGACG GCAACATCCT GGGGCACAAG 
451 CTGGAGTACA ACTACAACAG CCACAACGTC TATATCATGG CCGACAAGCA 
501 GAAGAACGGC ATCAAGGTGA ACTTCAAGAT CCGCCACAAC ATCGAGGACG 
551 GCAGCGTGCA GCTCGCCGAC CACTACCAGC AGAACACCCC CATCGGCGAC 
601 GGCCCCGTGC TGCTGCCCGA CAACCACTAC CTGAGCACCC AGTCCGCCCT 
651 GAGCAAAGAC CCCAACGAGA AGCGCGATCA CATGGTCCTG CTGGAGTTCG 
701 TGACCGCCGC CGGGATCACT CACGGCATGG ACGAGCTGTA CAAGTAAAGC 
751 GGCCGCGGAT CC 



Fig. 11 
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1 AAGCTTAAAC CATGCCCATG GGGTCTCTGC AACCGCTGGC CACCTTGTAC 

51 CTGCTGGGGA TGCTGGTCGC TTCCGTGCTA GCCGCCACCA GAAGATACTA 

101 CCTGGGTGCA GTGGAACTGT CATGGGACTA , TATGCAAAGT GATCTCGGTG 

151 AGCTGCCTGT i GGACGCAAGA TTTCCTCCTA GAGTGCCAAA ATCTTTTCCA 

201 TTCAACACCT CAGTCGTGTA CAAAAAGACT CTGTTTGTAG AATTCACGGA 

251 TCACCTTTTC AACATCGCTA AGCCAAGGCC ACCCTGGATG GGTCTGCTAG 

301 GTCCTACCAT CCAGGCTGAG GTTTATGATA CAGTGGTCAT TACACTTAAG 

351 AACATGGCTT CCCATCCTGT CAGTCTTCAT GCTGTTGGTG TATCCTACTG 

401 GAAAGCTTCT GAGGGAGCTG AATATGATGA TCAGACCAGT CAAAGGGAGA 

451 AAGAAGATGA TAAAGTCTTC CCTGGTGGAA ' GCC ATAC ATA TGTCTGGCAG 

501 GTCCTGAAAG AGAATGGTCC AATGGCCTCT GACCCACTGT GCCTTACCTA 

551 CTCATATCTT TCTCATGTGG ACCTGGTAAA AGACTTGAAT TCAGGCCTCA 

601 TTGGAGCCCT ACTAGTATGT AGAGAAGGGA GTCTGGCCAA GGAAAAGACA 

651 CAGACCTTGC ACAAATTTAT ACTACTTTTT GCTGTATTTG ATGAAGGGAA 

701 AAGTTGGCAC TCAGAAACAA AGAACTCCTT GATGCAGGAT AGGGATGCTG 

751 CATCTGCTCG GGCCTGGCCT AAAATGCACA CAGTCAATGG TTATGTAAAC 

801 AGGTCTCTGC CAGGTCTGAT TGGATGCCAC AGGAAATCAG TCTATTGGCA 

851 TGTGATTGGA ATGGGCACCA CTCCTGAAGT GCAC TCAAT A TTCCTCGAAG 

901 GTCACACATT TCTTGTGAGG AACCATCGCC AGGCGTCCTT GGAAATCTCG 

951 CCAATAACTT TCCTTACTGC TCAAACACTC TTGATGGACC TTGGACAGTT 

1001 TCTACTGTTT TGTCATATCT CTTCCC71CCA ACATGATGGC ATGGAAGCTT 

1051 ATGTCAAAGT AGACAGCTGT C C AGAGG AAC CCCAACTACG AATGAAAAAT 

1101 AATGAAGAAG CGGAAGACTA TGATGATGAT CTTACTGATT CTGAAATGGA 

1151 TGTGGTCAGG TTTGATGATG ACAACTCTCC TTCCTTTATC CAAATTCGCT 

1201 CAGTTGCCAA GAAGCATCCT AAAACTTGGG TACATTACAT TGCTGCTGAA 

1251 GAGGAGGACT GGGACTATGC TCCCTTAGTC CTCGCCCCCG ATGACAGAAG 

1301 TTATAAAAGT CAATATTTGA ACAATGGCCC TCAGCGGATT GGTAGGAAGT 

1351 ACAAAAAAGT CCGATTTATG GCATACACAG ATGAAACCTT TAAGACTCGT 

1401 GAAGCTATTC AGCATGAATC AGGAATCTTG GGACCTTTAC TTTATGGGGA 

1451 AGTTGGAGAC AC AC TGTTGA TTATATTTAA GAATCAAGCA AGCAGACCAT 

1501 ATAACATCTA CCCTCACGGA ATCACTGATG TCCGTCCTTT GTATTCAAGG 

1551 AGATTACCAA AAGGTGTAAA ACATTTGAAG GATTTTCCAA TTCTGCCAGG 

1601 AGAAATATTC AAATATAAAT GGACAGTGAC TGTAGAAGAT GGGCCAACTA 

1651 AATCAGATCC TCGGTGCCTG ACCCGCTATT ACTCTAGTTT CGTTAATATG 

1701 GAGAGAGATC TAGCTTCAGG ACTCATTGGC CCTCTCCTCA TCTGCTACAA 

1751 AGAATCTGTA GATCAAAGAG GAAACCAGAT AATGTCAGAC AAGAGGAATG 

1801 TCATCCTGTT TTCTGTATTT GATGAGAACC GAAGCTGGTA CCTCACAGAG 

1851 AAT AT ACAAC GCTTTCTCCC CAATCCAGCT GGAGTGCAGC TTGAGGATCC 

1901 AGAGTTCCAA GCCTCCAACA TCATGCACAG CATCAATGGC TATGTTTTTG 

1951 ATAGTTTGCA GTTGTCAGTT TGTTTGCATG AGGTGGCATA CTGGTACATT 

2001 CTAAGCATTG GAGCACAGAC TGACTTCCTT TCTGTCTTCT TCTCTGGATA 

2051 TACCTTCAAA CACAAAATGG TCTATGAAGA CACACTCACC CTATTCCCAT 

2101 TCTCAGGAGA AACTGTCTTC ATGTCGATGG AAAACCCAGG TCTATGGATT 

2151 CTGGGGTGCC ACAACTCAGA CTTTCGGAAC AGAGGCATGA CCGCCTTACT 

2201 GAAGGTTTCT AGTTGTGACA AGAACACTGG TGATTATTAC GAGGACAGTT 

2251 ATGAAGATAT TTCAGCATAC TTGCTGAGTA AAAACAATGC CATTGAACCA 

2301 AGAAGCTTCT CCCAGAATTC AAGACACCCT AGCACTAGGC AAAAGCAATT 

2351 TAATGCCACC CCACCAGTCT TGAAACGCCA TCAACGGGAA ATAACTCGTA 

2401 CTACTCTTCA GTCAGATCAA GAGGAAATTG ACTATGATGA TACCATATCA 

2451 GTTGAAATGA AGAAGGAAGA TTTTGACATT TATGATGAGG ATGAAAATCA 

2501 GAGCCCCCGC AGCTTTCAAA AGAAAACACG ACACTATTTT ATTGCTGCAG 

2551 TGGAGAGGCT CTGGGATTAT GGGATGAGTA GCTCCCCACA TGTTCTAAGA 

2601 AACAGGGCTC AGAGTGGCAG TGTCCCTCAG TTCAAGAAAG TTGTTTTCCA 

2651 GGAATTTACT GATGGCTCCT TTACTCAGCC CTTATACCGT GGAGAACTAA 

Fig. 12A 
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27 Oi ATGAACATTT GGGACTCCT6 GGGCCATATA TAAGAGCAGA , AGTTGAAGAT 

2751 AATATC AT GG TAACTTTCAG AAATC AGGC C TCTCGTCCCT ATTCCTTCTA 

2801 TTCTAGCCTT ATTTCTTATG AGGAAGATCA GAGGCAAGGA GCAGAACCTA 

2851 GAAAAAACTT TGTCAAGCCT AATGAAACCA AAACTTACTT TTGGAAAGTG 

2901 CAACATCATA TGGCACCCAC TAAAGATGAG TTTGACTGCA AAGCCTGGGC 

2951 TTATTTCTCT GATGTTGACC TGGAAAAAGA TGTGCACTCA GGCCTGATTG 

3001 GACCCCTTCT GGTCTGCCAC ACTAACACAC TGAACCCTGC TCATGGGAGA 

3051 C AAGT GACAG TACAGGAATT TGGTCTGTTT TTCACCATCT TTGATGAGAC 

3101 CAAAAGCTGG TACTTCACTG AAAATATGGA AAGAAACTGC AGGGCTCCCT 

3151 GCAATAT CC A GATGGAAGAT CCCACTTTTA AAGAGAATTA TCGCTTCCAT 

3201 GCAATCAATG GCTACATAAT GGATACACTA CCTGGCTTAG TAATGGCTCA 

3251 GGATCAAAGG ATTCGATGGT ATCTGCTCAG CATGGGCAGC AATGAAAACA 

3301 TCCATTCTAT TCATTTCAGT GGACATGTGT TCACTGTACG AAAAAAAGAG 

3351 GAGTATAAAA TGGCACTGTA CAATCTCTAT CCAGGTGTTT TTGAGACAGT 

3401 GGAAATGTTA CCATCCAAAG CTGGAATTTG GCGGGTGGAA TGCCTTATTG 

3451 GCGAGCATCT ACATGCTGGG ATGAGCACAC TTTTTCTGGT GTACAGCAAT 

3501 AAGTGTCAGA CTCCCCTGGG AATGGCTTCT GGACACATTA GAGATTTTCA 

3551 GATTACAGCT TCAGGACAAT ATGGACAGTG GGCCCCAAAG CTGGCCAGAC 

3601 TTCATTATTC CGGATCAATC AATGCCTGGA GCACCAAGGA GCCCTTTTCT 

3651 TGGATCAAGG TGGATCTGTT GGCACCAATG ATTATTCACG GCATCAAGAC 

3701 CCAGGGTGCC CGTCAGAAGT TCTCCAGCCT CTACATCTCT CAGTTTATCA 

3751 TCATGTATAG TCTTGATGGG AAGAAGTGGC AGACTTATCG AGGAAATTCC 

3801 ACTGGAACCT TAATGGTCTT CTTTGGCAAT GTGGATTCAT CTGGGATAAA 

3851 ACACAATATT TTTAACCCTC CAATTATTGC TCGATACATC CGTTTGCACC 

3901 CAACTCATTA TAGCATTCGC AGCACTCTTC GCATGGAGTT GATGGGCTGT 

3951 GATTTAAATA GTTGCAGCAT GCCATTGGGA ATGGAGAGTA AAGC AATATC 

4001 AGATGCACAG ATTACTGCTT CATCCTACTT TACCAATATG TTTGCCACCT 

4051 GGTCTCCTTC AAAAGCTCGA CTTCACCTCC AAGGGAGGAG TAATGCCTGG 

4101 AGACCTCAGG TGAATAATCC AAAAGAGTGG CTGCAAGTGG ACTTC CAGAA 

4151 GACAATGAAA GTCACAGGAG TAACTACTCA GGGAGTAAAA TCTCTGCTTA 

4201 CCAGCATGTA TGTGAAGGAG TTCCTCATCT CCAGCAGTCA AGATGGCCAT 

4251 C AGTGG ACT C TCTTTTTTCA GAATGGCAAA GTAAAGGTTT TTCAGGGAAA, 

4301 TCAAGACTCC TTCACACCTG TGGTGAACTC TCTAGACCCA CCGTTACTGA 

4351 CTCGCTACCT TCGAATTCAC CCCCAGAGTT GGGTGCACCA GATTGCCCTG 

4401 AGGAT GGAGG TTCTGGGCTG CGAGGCACAG GACCTCTACT GAGGGTGGCC 

4451 ACTGCAGCAC CTGCCACTGC CGTCACCTCT CCCTCCTCAG CTCCAGGGCA 

4501 GTGTCCCTCC CTGGCTTGCC TTCTACCTTT GTGCTAAATC CTAGCAGACA 

4551 CTGCCTTGAA GCCTCCTGAA TTAACTATCA TCAGTCCTGC ATTTCTTTGG 

4601 TGGGGGGCCA GGAGGGTGCA TCCAATTTAA CTTAACTCTT ACCGTCGACC 

j4651_T.GCAGGCCCA^ACGCGGCCGC- 
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1 AAGCTTAAAC CATGCCCATG 

51 CTGCTGGGGA TGCTGGTCGC 

101 CCTGGGCGCC GTGGAGCTGT 

151 AGCTCCCCGT GGACGCCCGC 

201 TTCAACACCA GCGTGGTGTA 

251 CCACCTGTTC AACATTGCCA 

301 GCCCCACCAT CCAGGCCGAG 

351 AACATGGCCA GCCACCCCGT' 

401 GAAGGCCAGC GAGGGCGCCG 

451 AGGAGGACGA CAAGGTGTTC 

501 GTGCTTAAGG AGAACGGCCC 

551 CAGCTACCTG AGCCACGTGG 

601 TCGGCGCCCT GCTGGTGTGT 

651 CAGACCCTGC ACAAGTTCAT 

701 GAGCT GGCAC AGCGAGACTA 

751 CCAGCGCCCG CGCCTGGCCC 

801 CGCAGCCTGC CCGGCCTGAT 

851 CGTCATCGGC ATGGGCACCA 

901 GCCACACCTT CCTGGTGCGC 

951 CCCATCACCT TCCTGACTGC 

1001 CCTGCTGTTC TGCCACATCA 

1051 ACGTGAAGGT GGACAGCTGC 

1101 AACGAGGAGG CCGAGGACTA 

1151 TGTCGTACGC TTCGACGACG 

1201 GCGTGGCCAA GAAGCACCCT 

1251 GAGGAGGACT GGGACTACGC 

1301 CTACAAGAGC CAGTACCTGA 

1351 ACAAGAAGGT GCGCTTCATG 

1401 GAGGCCATCC AGCACGAGTC 

1451 GGTGGGCGAC ACCCTGCTGA 

1501 ACAACATCTA CCCCCACGGC 

1551 CGCCTGCCCA AGGGCGTGAA 

1601 CGAGATCTTC AAGTACAAGT 

1651 AGAGC GACCC CCGCTGCCTG 

1701 GAGCGCGACC TGGCCTCCGG 

1751 GGAGAGCGTG GACCAGC GCG 

1801 TGATCCTGTT CAGCGTGTTC 

1851 AACATCCAGC GCTTCCTGCC 

1901 CGAGTTCCAG GCCAGCAACA 

1951 ACAGCCTGCA GCTGAGCGTG 

2001 CTGAGCATCG GCGCCCAGAC 

2051 TACCTTCAAG CACAAGATGG 

2101 TCTCCGGCGA GACTGTGTTC 

2151 CTGGGCTGCC ACAACAGCGA 

2201 GAAAGTCTCC AGCT GCGAC A 

2251 ACGAGGACAT CTCCGCCTAC 

2301 CGCTCCTTCT CCCAAAACTC 

2351 CAACGCCACC CCCCCCGTGC 

2401 CCACCCTGCA AAGCGACCAG 

2451 GTGGAGATGA AGAAGGAGGA 

2501 GAGCCCCCGC TCCTTCCAAA 

2551 TGGAGCGCCT GTGGGACTAC 

2601 AACCGCGCCC AGAGCGGCAG 

2651 GGAGTTCACC GACGGCAGCT 
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GGGTCTCTGC AACCGCTGGC CACCTTGTAC 

TTCCGTGCTA GCCGCCACCC GCCGCTACTA ' 

CCTGGGACTA CATGCAGAGC GACCTGGGCG 

TTCCCCCCCC GCGTGCCCAA GAGCTTCCCC 

CAAGAAAACC CTGTTCGTGG AGTTCACCGA 

AGCCGCGCCC CCCCTGGATG GGCCTGCTGG 

GTGTACGACA CCGTGGTGAT CACCCTGAAG 

CAGCCTGCAC GCCGTGGGCG TGAGCTACTG 

AGTACGACGA CCAGACGTCC CAGCGCGAGA 

CCGGGGGGGA GCCACACCTA CGTGTGGCAG 

TATGGCCAGC GACCCCCTGT GCCTGACCTA 

ACCTGGTGAA GGATCTGAAC AGCGGGCTGA 

CGCGAGGGCA GCCTGGCCAA GGAGAAAACC 

CCTGCTGTTC GCCGTGTTCG ACGAGGGGAA 

AGAACAGCCT GATGCAGGAC CGCGACGCCG 

AAGATGCACA CCGTTAACGG CTACGTGAAC 

CGGCTGCCAC CGCAAGAGCG TGTACTGGCA 

CCCCTGAGGT GCACAGCATC TTCCTGGAGG 

AACCACCGCC AGGCCAGCCT GGAGATCAGC 

CCAGACCCTG CTGATGGACC TAGGCCAGTT 

GCAGCCACCA GCACGACGGC ATGGAGGCTT 

CCCGAGGAGC CCCAGCTGCG CATGAAGAAC 

CGACGACGAC CTGACCGACA GCGAGATGGA 

ACAACAGCCC CAGCTTCATC CAGATCCGCA 

AAGACCTGGG TGCACTACAT CGCCGCCGAG 

CCCGCTAGTA CTGGCCCCCG ACGACCGCAG 

ACAACGGCCC CCAGCGCATC GGCCGCAAGT 

GCCTACACCG ACGAGACTTT CAAGACCCGC 

CGGCATCCTC GGCCCCCTGC TGTACGGCGA 

TCATCTTCAA GAACCAGGCC AGCAGGCCCT 

ATCACCGACG TGCGCCCCCT GTACAGCCGC 

GCACCTGAAG GACTTCCCCA TCCTGCCCGG 

GGACC GTGAC CGTGGAGGAC GGCCCCACCA 

ACCCGCTACT ACAGCAGCTT CGTGAACATG 

ACTGATCGGC CCCCTGCTGA TCTGCTACAA 

GCAACCAGAT CATGAGCGAC AAGCGCAACG 

GACGAGAACC GCAGCTGGTA TCTGACCGAG 

CAACCCCGCT GGCGTGCAGC TGGAAGATCC 

TCATGCACAG CATCAACGGC TACGTGTTCG 

TGCCTGCATG AGGTGGCCTA CTGGTACATC 

CGACTTCCTG AGCGTGTTCT TCTCCGGGTA 

TGTACGAGGA CACCCTGACC CTGTTCCCCT 

ATGTCTATGG AGAACCCCGG CCTGTGGATT 

CTTCCGCAAC CGCGGCATGA CTGCCCTGCT 

AGAACACCGG CGACTACTAC GAGGACAGCT 

CTGCTGTCCA AGAACAACGC CATCGAGCCC 

CCGCCACCCC AGCACGCGTC AGAAGCAGTT 

TGAAGCGCCA CCAGCGCGAG ATCACCCGCA 

GAGGAGATCG ACTACGACGA CACCATCAGC 

CTTCGACATC TACGACGAGG ACGAGAACCA 

AGAAAACCCG CCACTACTTC ATCGCCGCCG 

GGCATGAGCA GCAGCCCCCA CGTCCTGCGC 

CGTGCCCCAG TTCAAGAAGG TGGTGTTCCA 

TCACCCAGCC CCTGTACCGC GGCGAGCTGA 
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2701 ACGAGCACCT GGGCCTGCTC GGCCCCTACA TCCGCGCCGA GGTGGAGGAC 

2751 AACATCATGG TGACCTTCCG CAACCAAGCC TCCCGGCCCT ACTCCTTCTA 

2801 CTCCTCCCTG ATCAGCTACG AGGAGGACCA GCGCCAGGGC GCCGAGCCCC 

2851 GCAAGAACTT CGTGAAGCCC AACGAGACTAi AGACCTACTT CTGGAAGGTG 

2901 CAGCACCACA TGGCCCCCAC CAAGGACGAG TTCGACTGCA AGGCCTGGGC 

2951 CTACTTC AGC GACGTGGACC TGGAGAAGGA CGTGCACAGC GGCCTGATCG 

3001 GCCCCCTGCT GGTGTGCCAC ACCAACACCC TGAACCCCCC CCACGGGAGG 

3051 CAGGTGACTG TGCAGGAATT TGCCCTGTTC TTCACCATCT TCGACGAGAC , 

3101 TAAGAGCTGG TACTTCACCG AGAACATGGA GGGCAACTGC CGCGCCCCCT 

3151 GCAACATCCA GATGGAAGAT CCCACCTTCA AGGAGAACTA CCGCTTCCAC 

3201 GCCATCAACG GCTACATCAT GGACACCCTG CCCGGCCTGG TGATGGCCCA 

3251 GGACCAGCGC ATCCGCTGGT ACCTGCTGTC 1 TATGGGCAGC AACGAGAACA 

3301 TCCACAGCAT CCACTTCAGC GGCCACGT^T TCACCGTGCG CAAGAAGGAG 

3351 GAGTACAAGA TGGCCCTGTA CAACCTGTAC CCCGGCGTGT TCGAGACTGT 

3401 GGAGATGCTG CCCAGCAAGG CCGGGATCTG GCGCGTGGAG TGCCTGATCG 

3451 GCGAGCACCT GCACGCCGGC ATGAGCACCC TGTTC CTGGT GTACAGCAAC 

3501 AAGTGCCAGA CCCCCCTGGG CATGG CCAGC GGCCACATCC GCGACTTCCA 

3551 GATCACCGCC AGCGGCCAGT ACGGCCAGTG GGCTCCCAAG CTGGCCCGCC 

3601 TGCACTACAG CGGCAGCATC AACGCCTGGT CGACCAAGGA GCCCTTCTCC 

3651 TGGATCAAGG TGGACCTGCT GGCCCCCATG ATCATCCAC G GCATCAAGAC 

3701 CCAGGGCGCC CGCCAGAAGT TCAGCAGCCT GTACATCAGC CAGTTCATCA 

3751 TCATGTACTC TCTAGACGGC AAGAAGTGGC AGACCTACCG CGGCAACAGC 

3801 ACCGGCACCC TGATGGTGTT CTTCGGCAAC GTGGACAGCA GCGGCATCAA 

3851 GCACAACATC TTCAACCCCC CCATCATCGC CCGCTACATC CGCCTGCACC 

3901 CCACCCACTA CAGCATCCGC AGCACCCTGC GCATGGAGCT GATGGGCTGC 

3951 GACCTGAACA GCTGCAGCAT GCCCCTGGGC ATGGAGAGCA AGGCCATCAG 

4001 CGACGCCCAG ATCACCGCCT CCAGCTACTT CAC CAACATG TTCGCCACCT 

4051 GGAGCCCCAG CAAGGCCCGC CTGCACCTGC AGGGCCGCAG CAACGCCTGG 

4101 CGCCCCCAGG TGAACAACCC CAAGGAGTGG CTGCAGGTGG ACTTCCAGAA 

4151 AACCATGAAG GTGACTGGCG TGACCACCCA GGGCGTCAAG AGCCTGCTGA 

4201 CCAGCATGTA CGTGAAGGAG TTCCTGATCA GCAGCAGCCA GGACGGCCAC 

4251 CAGTGGACCC TGTTCTTCCA AAACGGCAAG GTGAAGGTGT TCCAGGGCAA 

4301 CCAGGACAGC TTCACACCGG TCGTGAACAG CCTGGACCCC CCCCTGCTGA 

4351 CCCGCTACCT GCGCATCCAC CCCCAGAGCT GGGTGCACCA GATCGCCCTG 

4401 CGCATGGAGG TGCTGGGCTG CGAGGCCCAG GACCTGTACT GAAGCGGCCG 

4451 C 
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